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ABSTRACT 

As  systems  become  more  distributed,  they  are  vulnerable  to  new  forms  of  attack. 

An  adversary  could  seize  control  of  several  nodes  in  a  network  and  reprogram 
them,  unbeknownst  to  the  rest  of  the  network.  Strategies  are  needed  that  can 
ensure  robust  perfonnance  in  the  presence  of  these  sorts  of  attacks.  This  thesis 
studies  the  adversarial  problem  in  three  scenarios. 

First  is  the  problem  of  network  coding,  in  which  a  source  seeks  to  send  data  to  a 
destination  through  a  network  of  intennediate  nodes  that  may  perform  arbitrarily 
complicated  coding  functions.  When  an  adversary  controls  nodes  in  the  network, 
achievable  rates  and  upper  bounds  on  capacity  are  found,  and  Polytope  Codes 
are  introduced,  which  are  a  nonlinear  class  of  codes  specially  designed  to  handle 
adversaries  in  a  network  coding  framework. 

Second,  multiterminal  source  coding  is  studied,  in  which  several  nodes  make 
correlated  measurements,  independently  encode  them,  and  transmit  their  encodings 
to  a  common  decoder,  which  attempts  to  recover  some  information.  Two 
special  cases  of  this  problem  are  studied  when  several  of  the  nodes  may  be  controlled 
by  an  adversary:  the  problem  of  Slepian  and  Wolf,  in  which  the  decoder 
attempts  to  perfectly  decode  all  measurements,  and  the  CEO  Problem,  in  which 
the  decoder  attempts  to  estimate  a  source  correlated  with  the  measurements. 

Finally,  adversarial  attacks  are  studied  against  power  system  sensing  and  estimation. 
In  this  problem,  a  control  center  receives  various  measurements  from 
meters  in  a  power  grid,  and  attempts  to  recover  information  about  the  state  of  the 
system.  Attacks  of  various  degrees  of  severity  are  studied,  as  well  as  countermeasures 
that  the  control  center  may  employ  to  prevent  these  attacks. 
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Oliver  Eli  Kosut,  Ph.D. 

Cornell  University  2010 

As  systems  become  more  distributed,  they  are  vulnerable  to  new  forms  of  attack. 
An  adversary  could  seize  control  of  several  nodes  in  a  network  and  reprogram 
them,  unbeknownst  to  the  rest  of  the  network.  Strategies  are  needed  that  can 
ensure  robust  performance  in  the  presence  of  these  sorts  of  attacks.  This  thesis 
studies  the  adversarial  problem  in  three  scenarios. 

First  is  the  problem  of  network  coding,  in  which  a  source  seeks  to  send  data  to  a 
destination  through  a  network  of  intermediate  nodes  that  may  perform  arbitrarily 
complicated  coding  functions.  When  an  adversary  controls  nodes  in  the  network, 
achievable  rates  and  upper  bounds  on  capacity  are  found,  and  Polytope  Codes 
are  introduced,  which  are  a  nonlinear  class  of  codes  specially  designed  to  handle 
adversaries  in  a  network  coding  framework. 

Second,  multiterminal  source  coding  is  studied,  in  which  several  nodes  make 
correlated  measurements,  independently  encode  them,  and  transmit  their  encod¬ 
ings  to  a  common  decoder,  which  attempts  to  recover  some  information.  Two 
special  cases  of  this  problem  are  studied  when  several  of  the  nodes  may  be  con¬ 
trolled  by  an  adversary:  the  problem  of  Slepian  and  Wolf,  in  which  the  decoder 
attempts  to  perfectly  decode  all  measurements,  and  the  CEO  Problem,  in  which 
the  decoder  attempts  to  estimate  a  source  correlated  with  the  measurements. 

Finally,  adversarial  attacks  are  studied  against  power  system  sensing  and  es¬ 
timation.  In  this  problem,  a  control  center  receives  various  measurements  from 
meters  in  a  power  grid,  and  attempts  to  recover  information  about  the  state  of  the 
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system.  Attacks  of  various  degrees  of  severity  are  studied,  as  well  as  countermea¬ 
sures  that  the  control  center  may  employ  to  prevent  these  attacks. 
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CHAPTER  1 

INTRODUCTION 

1.1  Motivation  and  Overview 

Increasingly,  we  are  surrounded  by  distributed  systems  comprised  of  many  nodes 
interacting  with  one  another.  From  the  internet  to  cell  phones  to  sensor  networks, 
everything  is  made  up  of  many  small  pieces.  This  trend  creates  new  security 
problems,  which  require  new  methods  to  build  systems  that  are  robust  against 
various  forms  of  attack.  In  this  thesis,  we  consider  one  potential  form  of  an  attack 
against  a  network,  that  of  a  malicious  adversary  entering  the  network,  seizing  and 
controlling  a  group  of  nodes,  unbeknownst  to  the  rest  of  the  network.  We  study  this 
scenario  in  several  contexts,  analyzing  the  impact  of  the  adversary,  and  designing 
strategies  to  counteract  its  presence.  In  particular,  we  focus  on  two  problems  from 
information  theory:  multiterminal  source  coding  and  network  coding,  in  addition 
to  a  problem  in  power  system  sensing  and  estimation. 

There  are  several  applications  in  communication  networks  in  which  one  user 
may  wish  to  relay  data  through  other  nodes  toward  a  second  user,  when  those 
relay  nodes  may  not  be  reliable  or  trustworthy.  Consider,  for  example,  a  wireless 
ad  hoc  network.  In  such  a  network,  nodes  may  enter  and  exit  the  network  often, 
and  messages  need  to  be  transmitted  through.  Nodes  need  to  learn  about  each 
other,  establish  communication  paths,  and  update  them  as  the  network  changes. 
It  is  easy  to  imagine  that  a  node  could  enter  the  network  without  any  intention  of 
following  the  agreed-upon  protocol.  It  could  at  first  appear  to  act  honestly,  so  as  to 
establish  itself  as  a  relay  point  in  the  network,  but  then  it  could  forward  messages 
incorrectly,  jam  its  neighbor’s  signals,  or  eavesdrop  on  others’  communication. 
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Even  in  the  wired  setting,  nodes  may  be  vulnerable  to  malicious  reprogramming. 
Internet  routers  can  be  hacked  into  and  compromised,  or  simply  fail  and  transmit 
unreliable  information.  These  concerns  motivate  our  study  of  network  coding  in 
the  presence  of  adversarial  nodes. 

Network  coding  is  a  concept  in  network  information  theory  that  allows  nodes 
in  a  network  to  perform  potentially  elaborate  operations  to  transmit  data  through 
a  network.  In  Chapter  2,  we  study  this  problem  with  adversarial  nodes.  We  give 
upper  bounds  on  communication  rates,  and  present  a  class  of  nonlinear  codes  called 
Polytope  Codes,  which  is  the  first  class  of  codes  capable  of  achieving  capacity  for 
a  general  class  of  networks  with  adversarial  nodes.  In  particular,  we  show  that 
these  codes  achieve  capacity  for  a  certain  class  of  planar  networks. 

Now  consider  a  sensor  network.  This  could  involve  a  large  number  of  cheap 
nodes  gathering  data  to  be  collected  by  a  central  receiver  that  acts  as  a  fusion 
center,  organizing  and  analyzing  the  aggregate  information.  Should  some  of  the 
sensors  be  seized  by  an  adversary,  the  fusion  center  should  use  strategies  to  make 
its  decisions  robust  against  these  attacks.  The  topology  of  a  sensor  network  made 
up  of  many  nodes  communicating  directly  to  a  single  fusion  center  is  exactly  that 
of  multiterminal  source  coding,  which  is  our  second  major  area  of  study.  We  are 
mostly  interested  in  the  tradeoff  between  the  adversary’s  impact  on  the  quality 
of  the  information  collected  at  the  fusion  center  (referred  to  as  the  decoder  in 
the  sequel),  and  the  amount  of  data  is  transmitted  from  the  sensors  to  the  fusion 
center.  We  study  two  main  subcases  of  the  multiterminal  source  coding  problem 
with  adversarial  nodes:  in  Chapter  3,  the  problem  of  Slepian  and  Wolf,  in  which 
the  decoder  attempts  to  recover  all  data  that  was  available  at  the  sensors;  and  in 
Chapter  4,  the  CEO  Problem,  in  which  the  decoder  estimates  a  quantity  observed 
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by  each  sensor  through  a  noisy  channel.  For  both  these  problems,  we  give  achiev¬ 
able  schemes  and  outer  bounds  on  the  sets  of  achievable  communication  rates.  In 
some  cases,  these  bounds  match. 

Finally,  we  consider  the  power  system.  The  power  grid  in  this  country — and 
most  others — serves  to  deliver  reliable  electricity  to  millions  of  homes  and  offices, 
and  its  continued  operation  is  vital  part  of  the  infrastructure  of  our  society.  There¬ 
fore,  any  potential  vulnerabilities  are  a  serious  concern.  The  system  itself  is  a  vast 
network  of  generators,  transmission  lines,  transformers,  and  switches.  It  is  im¬ 
portant  for  the  continuing  operation  of  the  grid  that  operators  in  control  centers 
have  reliable  up-do-date  information  about  the  current  state  of  the  system.  To  this 
end,  numerous  meters  are  deployed  throughout  the  grid,  measuring  voltage  and/or 
power  flow.  These  meters  report  their  findings  back  to  control  centers,  who  use 
the  gathered  data  to  make  decisions.  If  an  adversary  were  able  to  manipulate  the 
meter  readings  sent  to  the  control  center,  then  it  could  potentially  influence  the 
trajectory  of  the  power  state,  and  even  cause  blackouts.  In  Chapter  5,  we  present 
some  results  that  allow  us  to  identify  vulnerable  parts  of  the  power  system  to  these 
attacks,  and  detection  strategies  to  fold  them  if  they  occur. 


1.2  Byzantine  Attack 

The  notion  of  an  adversary  controlling  a  subset  of  nodes  in  a  network,  unbeknownst 
to  the  other  nodes,  is  sometimes  known  as  Byzantine  attack.  The  term  Byzantine 
is  conspicuous,  and  deserves  a  moment’s  explanation.  According  to  Greek  legend, 
in  the  7th  century  BC  lived  King  Byzas,  who  in  667  BC  founded  the  city  of 
Byzantium  on  the  shores  of  the  Bosphorus  Strait  connecting  the  Mediterranean 
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sea  to  the  Black  sea,  the  location  of  present  day  Istanbul.  Byzantium  kept  its 
name  and  became  a  chief  city  of  the  Roman  Empire,  which  by  the  4th  century 
AD  had  become  so  large  and  difficult  to  govern  that  it  began  to  fracture  between 
east  and  west.  In  330  AD,  the  emperor  Constantine  I  moved  the  capital  of  the 
eastern  part  to  Byzantium,  and  renamed  the  city  Constantinople.  In  its  day,  it 
was  usually  called  the  Eastern  Roman  Empire,  or  simply  the  Roman  Empire,  since 
it  survived  for  almost  a  millennium  longer  than  the  western  half.  However,  partly 
out  of  confusion,  and  partly  out  of  a  desire  to  differentiate  it  from  the  earlier  and 
unified  Roman  Empire,  by  the  nineteenth  century  the  eastern  empire  came  to  be 
known  by  historians  as  the  Byzantine  Empire,  even  though  the  empire  came  into 
being  at  the  very  moment  that  Byzantium  was  renamed. 

Long  after  the  empire  collapsed  after  Constantinople  fell  to  the  Ottomans  in 
1453,  the  Byzantine  Empire  became  known  for  being  excessively  beaurocratic  and 
decadent.  Hence  the  word  “Byzantine”  came  to  mean  overly  complicated,  hard 
to  understand,  or  unnecessarily  obtuse.  In  its  entry  on  the  word  “byzantine”, 
the  Oxford  English  Dictionary  sites  the  1937  book  Spanish  Testament  by  Arthur 
Koestlcr  as  an  early  written  example  of  this  use  of  the  word.  He  wrote  “In  the  old 
days  people  often  smiled  at  the  Byzantine  structure  of  the  Spanish  army”  [1],  By 
the  latter  half  of  the  20th  century,  this  meaning  of  the  word  was  common. 

In  1980,  Marshall  Pease,  Leslie  Lamport,  and  Robert  Shostak  wrote  [2],  titled 
“Reaching  Agreement  in  the  Presence  of  Faults,”  which  was  based  partially  on 
earlier  work  by  Lamport  and  others  [3,  4],  Two  years  later,  the  same  authors 
wrote  [5],  in  which  they  renamed  the  same  problem  “the  Byzantine  Generals’ 
Problem.”  This  version  of  the  problem  is  described  follows.  A  number  of  generals 
of  the  Byzantine  army  are  separately  encamped  outside  an  enemy  city.  They  must 
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come  to  an  agreement  about  whether  to  attack  the  city.  They  do  this  by  sending 
messengers  from  one  to  another,  indicating  each  general’s  opinion  or  preference  for 
their  course  of  action.  This  is  complicated  by  the  fact  that  some  of  the  generals 
are  traitors ;  that  is,  they  may  send  inconsistent  or  meaningless  messages,  and 
therefore  make  it  more  difficult  for  the  honest  generals  to  reach  agreement.  The 
result  of  [2]  and  [5]  is  that  consensus  among  the  honest  generals  can  be  reached  as 
long  as  fewer  than  one  third  of  the  generals  are  traitors. 

On  his  website,  Lamport  describes  the  process  leading  to  the  more  colorful 
naming  of  the  problem: 

I  have  long  felt  that,  because  it  was  posed  as  a  cute  problem 
about  philosophers  seated  around  a  table,  Dijkstra’s  dining  philoso¬ 
pher’s  problem  received  much  more  attention  than  it  deserves.  (For 
example,  it  has  probably  received  more  attention  in  the  theory  com¬ 
munity  than  the  readers/writers  problem,  which  illustrates  the  same 
principles  and  has  much  more  practical  importance.)  I  believed  that 
the  problem  introduced  in  [2]  was  very  important  and  deserved  the 
attention  of  computer  scientists.  The  popularity  of  the  dining  philoso¬ 
phers  problem  taught  me  that  the  best  way  to  attract  attention  to  a 
problem  is  to  present  it  in  terms  of  a  story. 

There  is  a  problem  in  distributed  computing  that  is  sometimes 
called  the  Chinese  Generals  Problem,  in  which  two  generals  have  to 
come  to  a  common  agreement  on  whether  to  attack  or  retreat,  but 
can  communicate  only  by  sending  messengers  who  might  never  arrive. 

I  stole  the  idea  of  the  generals  and  posed  the  problem  in  terms  of  a 
group  of  generals,  some  of  whom  may  be  traitors,  who  have  to  reach 


5 


22 


a  common  decision.  I  wanted  to  assign  the  generals  a  nationality  that 
would  not  offend  any  readers.  At  the  time,  Albania  was  a  completely 
closed  society,  and  I  felt  it  unlikely  that  there  would  be  any  Albanians 
around  to  object,  so  the  original  title  of  this  paper  was  The  Albanian 
Generals  Problem.  Jack  Goldberg  was  smart  enough  to  realize  that 
there  were  Albanians  in  the  world  outside  Albania,  and  Albania  might 
not  always  be  a  black  hole,  so  he  suggested  that  I  find  another  name. 

The  obviously  more  appropriate  Byzantine  generals  then  occurred  to 
me. 

When  he  says  “obviously  more  appropriate,’'  he  is  evidently  referring  to  the  fact 
that  “Byzantine”  can  describe  the  generals  in  two  ways:  first,  it  is  their  nationality; 
second,  some  of  their  actions  are  undoubtedly  byzantine. 

A  critical  component  of  the  problem  description  in  the  original  Byzantine  Gen¬ 
erals’  Problem  is  that  the  traitors  may  send  arbitrary  messages  to  other  generals, 
and  the  honest  generals  must  reach  agreement  no  matter  what  the  traitors  do. 
This  notion  of  robust  performance  in  the  face  of  arbitrary  behavior  is  at  the  heart 
of  Byzantine  attack,  and  at  the  heart  of  the  adversary  model  for  the  work  in  this 
thesis. 

An  important  distinction  should  be  made  between  two  possible  interpretations 
of  this  sort  of  model.  The  interpretation  originally  intended  by  [2,  5]  is  that  of 
errors;  that  is,  the  generals  represent  identical  units  which  should  in  principle  pro¬ 
duce  the  same,  unless  one  suffers  from  a  random  fault.  If  a  system  is  designed  to  be 
robust  against  Byzantine  failures,  then  it  will  always  come  to  the  correct  decision 
even  if  the  faulty  unit  behaves  in  an  arbitrary  manner.  The  second  interpretation, 
and  the  one  we  mostly  use  in  this  thesis,  is  that  of  a  true  adversary:  an  intelligent 
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entity  motivated  to  defeat  the  aims  of  the  network  if  it  can,  one  that  will  study  the 
network  operation  and  search  for  a  vulnerability.  These  two  interpretations  are 
usually  mathematically  equivalent,  but  our  choice  of  the  second  one  does  motivate 
some  choices  we  make  in  our  modeling  of  the  problem.  For  example,  in  our  work  on 
network  coding,  discussed  in  Chapter  2,  we  adopt  a  model  in  which  the  adversary 
controls  nodes  in  the  network.  As  we  will  discuss  our  network  coding  literature 
review  in  Section  1.3.1,  this  differs  from  some  earlier  work  on  adversarial  attacks 
in  network  coding.  In  particular,  [6,  7]  studied  the  problem  of  an  adversary  con¬ 
trolling  links  in  the  network,  as  opposed  to  nodes.  They  seem  to  be  using  the  first 
interpretation  of  Byzantine  attack,  such  that  adversarial  actions  represent  errors 
on  communication  channels  between  nodes,  and  as  long  as  the  number  of  these  er¬ 
rors  is  small,  no  matter  what  each  error  is,  they  can  guarantee  performance.  Our 
view,  instead,  is  that  the  attacks  represent  an  adversary  taking  control  of  nodes 
in  a  network,  and  therefore  able  to  alter  any  transmission  made  by  those  nodes. 
This  leads  to  a  mathematically  different  problem,  and,  it  turns  out,  a  harder  one. 

Another  important  element  in  studying  Byzantine  and  adversarial  attacks  has 
to  do  with  placing  a  limit  the  adversary’s  power.  The  problem  should  be  designed 
so  that  successful  strategies  are  robust  against  attacks  of  a  certain  size.  Obviously, 
if  the  adversary  controls  the  entire  network,  then  no  strategy  could  ever  defeat 
it.  Therefore,  we  allow  the  adversary  to  perform  arbitrary  actions,  but  subject  to 
being  able  to  control  only  a  certain  number  of  nodes  in  the  network.  The  honest 
users  of  network  cannot  know  for  certain  that  the  number  of  nodes  to  come  under 
the  adversary’s  control  will  not  exceed  the  threshold,  but  it  they  can  at  least  make 
a  performance  guarantee  if  it  does  not  exceed  the  threshold.  Should  the  adversary 
size  exceed  the  threshold,  performance  could  degrade.  We  can  therefore  think  of 
the  limit  on  adversary  size  not  as  a  priori  knowledge  of  the  power  of  the  adversary, 
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but  rather  as  a  parameter  with  which  we  can  trade  off  robustness  to  attacks  with 
performance.  As  we  will  see,  handling  more  adversaries  requires  more  redundancy 
in  the  system,  which  means  performance  decreases. 


1.3  Network  Coding 

1.3.1  Related  Work 

A  classical  problem  in  graph  theory  is  the  maximal  flow  problem.  That  is,  given 
a  directed  graph  composed  of  nodes  and  capacity-limited  edges,  we  wish  to  find 
the  flow  of  maximum  size  from  a  source  to  a  sink.  A  flow  is  given  by  a  quantity 
associated  with  each  edge,  representing  the  amount  of  some  commodity  flowing 
through  that  edge,  and  upper  bounded  by  the  edge  capacity.  Flow  must  be  con¬ 
served  at  each  node,  except  for  the  source,  which  produces  the  commodity,  and 
the  sink,  which  consumes  it.  In  1956,  Ford  and  Fulkerson  [8]  showed  that  the  flow 
maximizing  the  amount  of  the  commodity  that  travels  from  the  source  to  the  des¬ 
tination  is  given  by  the  minimum  cut  of  the  graph.  This  is  known  as  the  max-flow 
min-cut  theorem.  By  a  cut ,  we  mean  a  way  to  split  the  network  into  two  parts, 
such  that  the  source  is  in  one  part  and  the  sink  in  the  other.  The  value  of  a  cut  is 
given  by  the  total  capacity  of  all  edges  from  the  part  with  the  source  to  the  part 
with  the  sink.  The  min-cut  is  the  minimum  cut  value  over  all  cuts  separating  the 
source  from  the  sink. 

Even  though  the  problem  studied  by  Ford  and  Fulkerson  was  purely  mathemat¬ 
ical  in  nature — the  commodity  is  an  abstract  notion,  and  is  often  imagined  to  be, 
for  example,  water  flowing  through  pipes — the  result  can  immediately  be  applied 
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to  communicating  in  a  network.  Nodes  in  the  graph  represent  machines  able  to 
receive  and  transmit  messages  along  communication  links,  which  are  represented 
by  edges.  The  edge  capacities  represent  communication  limits  of  the  communi¬ 
cation  links.  A  flow  through  the  graph  can  be  converted  into  a  routing  strategy, 
whereby  the  numbers  of  data  packets  received  and  transmitted  by  intermediate 
nodes  is  given  by  the  flow.  In  this  setup,  nodes  in  the  network  do  nothing  except 
copy  received  information  to  their  outgoing  communication  links. 

The  classical  max-flow  min-cut  result  cannot  be  applied  to  the  problem  of  mul¬ 
ticast:  the  case  that  a  single  source  wishes  to  transmit  the  same  message  to  more 
than  one  destination.  Here  the  “water  as  information”  metaphor  breaks  down, 
because  data  packets,  unlike  water,  can  be  duplicated,  so  a  node  with  an  incoming 
bit  stream  can  reproduce  it  one  several  outgoing  links.  More  significantly,  data  can 
be  combined  in  nontrivial  ways.  In  particular,  more  intelligent  intermediate  nodes 
can  do  coding-,  in  principle,  a  node’s  output  can  be  an  arbitrary  function  of  its 
input.  In  the  landmark  paper  [9],  it  was  found  that  if  this  so-called  network  coding 
is  allowed,  then  for  multicast,  the  min-cut  can  be  achieved  to  each  destination 
simultaneously. 

In  the  last  decade,  network  coding  has  become  one  of  the  pillars  of  network 
information  theory.  While  the  achievability  proof  used  in  [9]  relied  on  a  random 
coding  argument  over  arbitrary  coding  functions,  it  was  shown  in  [10]  that  for 
multicast  it  is  sufficient  to  use  only  linear  codes:  that  is,  the  values  transmitted 
on  each  link  are  elements  taken  from  a  finite  held,  and  each  node  takes  linear 
combinations  over  that  held  of  its  input  to  produce  its  output.  In  [11],  an  alge¬ 
braic  framework  for  network  coding  was  presented,  which  led  to  a  necessary  and 
sufficient  conditions  for  the  success  of  a  linear  code  in  a  general  setting,  as  well  as 
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polynomial  time  encoding  and  decoding.  The  idea  of  random  linear  network  coding 
was  first  suggested  in  [12]  and  elaborated  in  [13].  In  this  approach,  linear  coding  is 
performed  with  the  coefficients  chosen  randomly;  with  high  probability,  the  result 
is  a  good  network  code  that  can  achieve  the  network  capacity  for  multicast.  Ran¬ 
dom  linear  coding  does  not  require  an  outside  authority  which  knows  the  complete 
network  topology  in  order  to  design  good  codes;  instead,  nodes  may  work  in  a 
more  distributed  manner  without  losing  any  communication  rate.  A  polynomial 
time  algorithm  for  finding  good  linear  network  codes  was  given  in  [14].  Network 
coding  for  practical  use  has  been  studied  and/or  demonstrated  in  [15,  16,  IT,  18]. 

While  linear  coding  is  sufficient  to  achieve  capacity  for  multicast,  for  some 
problems  with  multiple  sources,  linear  codes  are  insufficient.  It  was  shown  in  [19] 
that  standard  network  coding  problems  fall  into  three  categories:  (1)  coding  is  un¬ 
necessary,  and  routing  is  enough  to  achieve  capacity;  (2)  linear  coding  is  sufficient, 
and  optimal  linear  codes  can  be  found  in  polynomial  time,  and  (3)  determining 
whether  a  linear  code  can  achieve  a  given  communication  rate  is  NP-hard.  They 
also  gave  an  example  of  a  network  in  the  third  category  for  which  a  nonlinear  code 
can  outperform  any  linear  code.  It  was  pointed  out  in  [20]  that  even  this  code  is 
not  far  from  linear,  and  [20]  introduced  the  class  of  vector  linear  codes,  whereby 
several  elements  from  a  finite  field  can  be  transmitted  on  each  link.  However,  [21] 
provided  an  example  for  which  even  these  codes  are  insufficient  for  general  multi¬ 
source  multi-destination  problems.  The  work  in  this  thesis  on  network  coding 
with  adversaries  shows  that  even  for  the  single-source  single-destination  problem, 
nonlinear  network  coding  is  required  to  achieve  capacity.  This  indicates  that  the 
general  adversary  problem  may  differ  substantially  from  the  standard  network 
coding  problem. 
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Another  branch  of  study  on  network  coding  involves  the  so-called  entropic  re¬ 
gion.  For  n  correlated  random  variables,  one  may  calculate  the  joint  entropy  using 
Shannon’s  entropy  measure  for  any  subset  of  the  variables.  There  are  2n_1  non¬ 
trivial  subsets,  so  any  set  of  variables  can  be  associated  with  a  2n_1  dimensional 
vector.  Any  vector  for  which  there  exists  such  a  set  of  random  variables  is  called 
entropic.  The  closure  of  the  set  of  all  entropic  vectors  is  often  written  F*.  Any 
linear  bound  on  T*  is  known  as  an  information  inequality.  The  framework  of  the 
entropic  region  and  information  inequalities  was  introduced  in  [22],  The  posi¬ 
tivity  of  conditional  entropy  and  conditional  mutual  information  compose  a  set 
of  information  inequalities  known  as  the  Shannon  type  inequalities.  It  was  first 
shown  in  [23]  that  there  exist  non- Shannon  type  inequalities:  that  is,  f*  is  strictly 
smaller  than  the  set  of  vectors  satisfying  the  Shannon  type  inequalities.  It  can 
be  shown  that  any  network  coding  problem  can  be  expressed  in  terms  of  F*;  if  F* 
were  completely  known,  then  all  network  coding  problems  would  be  immediately 
solved.  Moreover,  it  was  shown  in  [24]  that  non- Shannon  type  inequalities  can 
be  relevant  in  network  coding  problems.  This  indicates  that  the  general  network 
coding  problem  is  identical  to  that  of  characterizing  F*.  In  [25],  it  was  shown  that 
F*  is  identical  to  the  set  of  group  characterizable  vectors  derived  from  subgroups 
of  a  finite  group.  Therefore,  so-called  coset  codes  based  on  finite  groups  can  in 
principle  solve  any  network  coding  problem.  Linear  codes  are  special  cases  of  these 
codes.  An  interesting  property  of  our  Polytope  Codes  used  to  defeat  adversaries, 
discussed  in  Chapter  2,  is  that  they  do  not  appear  to  be  special  cases  of  coset 
codes. 

The  first  consideration  of  network  coding  with  security  concerns  was  [26],  which 
considered  the  problem  of  an  eavesdropper  able  to  overhear  the  messages  sent  on 
a  fixed  number  of  communication  links  in  a  network.  This  was  based  partially  on 
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the  foundational  work  on  information-theoretic  security  by  Shannon  [27]  as  well 
as  Wyner’s  wiretap  channel  [28].  In  [26],  it  is  shown  that  when  the  eavesdropper’s 
capabilities  are  always  identical,  linear  codes  are  sufficient  to  achieve  the  highest 
possible  communication  rate  without  allowing  the  eavesdropping  to  learn  anything 
about  the  message.  The  same  problem  but  with  communication  links  of  differing 
capacity  was  studied  in  [29].  In  this  setup,  the  eavesdropper  has  varying  power 
depending  on  which  links  it  is  able  to  overhear,  and  [29]  finds  that  many  standard 
linear  coding  techniques  fail,  and  one  must  be  more  careful  in  designing  the  code 
so  as  to  maximize  secure  communication  rate.  This  is  a  different  sort  of  adversary 
to  the  ones  we  consider,  but  it  is  a  similar  finding,  in  that  when  the  adversary 
has  different  levels  of  power  depending  on  where  it  is  in  the  network,  the  problem 
becomes  harder. 

Adversarial  attacks  on  network  coding  were  first  considered  in  [30],  which 
looked  at  detecting  adversaries  in  a  random  linear  coding  environment.  The  first 
major  work  on  correcting  adversaries  in  network  coding  was  [6,  7].  This  two-part 
paper  looked  at  the  multicast  network  coding  problem  in  which  the  adversary 
controls  exactly  z  unit-capacity  links  in  the  network.  This  was  introduced  as  “net¬ 
work  error  correction” ,  and,  as  mentioned  above,  considered  the  errors  as  channel 
failures  rather  than  adversarial  actions.  In  [31],  the  same  problem  is  studied, 
providing  distributed  and  low  complexity  coding  algorithms  to  achieve  the  same 
asymptotically  optimal  rates.  In  addition,  [31]  looks  at  two  adversary  models 
slightly  different  from  the  omniscient  one  considered  in  [6,  7]  and  in  this  thesis. 
They  show  that  higher  rates  can  be  achieved  under  these  alternate  models.  In  onr 
study  of  multiterminal  source  coding,  we  explore  similar  ways  of  slightly  reducing 
the  power  of  the  adversary,  but  for  the  rest  of  this  thesis,  we  always  assume  the 
worst  case  adversary  that  is  completely  omniscient.  In  [32],  a  more  general  view 
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of  the  adversary  problem  is  given,  whereby  the  network  itself  is  abstracted  into  an 
arbitrary  linear  transformation. 

These  works  seek  to  correct  for  the  adversarial  errors  at  the  destination.  An 
alternative  strategy  known  as  the  watchdog,  studied  for  wireless  network  coding 
in  [33],  is  for  nodes  to  police  downstream  nodes  by  overhearing  their  messages  to 
detect  modifications.  In  [34],  a  similar  approach  is  taken,  and  they  found  that 
nonlinear  operations  similar  to  ours  can  be  helpful,  just  as  we  do. 

The  work  presented  in  Chapter  2  on  an  adversary  able  to  control  a  fixed  number 
of  nodes  in  a  network  rather  than  a  fixed  number  of  edges  has  previously  appeared 
in  [35,  36].  Simultaneously  with  this  work,  a  slightly  different  adversarial  network 
coding  problem  was  considered  in  [37,  38] .  In  these  papers,  the  adversary  controls 
a  fixed  number  of  edges,  as  in  [6,  7],  but  the  edges  may  unequal  capacity.  They 
find  that  this  problem  also  requires  nonlinear  coding  to  achieve  capacity.  It  seems 
that  linear  coding  is  sufficient  when  the  adversary  has  uniform  power,  no  matter 
where  it  is — as  in  the  unit-capacity  edge  problem — but  when  its  power  can  vary, 
such  as  the  node  problem  or  the  unequal-edge  problem,  nonlinear  coding  may  be 
required. 


1.3.2  Contributions 

Our  primary  contribution  is  a  class  of  network  codes  to  defeat  adversaries  called 
Polytope  Codes.  These  were  originally  introduced  in  [35]  under  the  less  descriptive 
term  “bounded-linear  codes”.  Polytope  Codes  are  nonlinear  codes,  and  they  im¬ 
prove  over  linear  codes  by  allowing  error  detection  inside  the  network.  This  allows 
adversaries  to  be  more  easily  identified,  whereby  the  messages  they  send  can  be 


13 


30 


ignored.  We  also  prove  a  cut-set  upper  bound  on  achievable  rates  in  networks  with 
node-based  adversaries.  This  cut-set  bound  is  a  form  of  the  Singleton  bound  [39], 
originally  proved  for  classical  error-correcting  codes.  We  show  that  for  a  class  of 
planar  networks,  Polytope  Codes  can  achieve  the  rate  given  by  this  cut-set  bound, 
which  means  that  they  achieve  the  capacity  for  these  networks.  We  also  show  that 
the  cut-set  bound  is  not  always  achievable,  by  giving  an  example  network  with  a 
strictly  smaller  capacity. 

We  briefly  describe  the  high-level  idea  behind  Polytope  Codes,  because  the 
same  idea  is  at  the  heart  of  our  achievable  results  for  multiterminal  source  coding. 
It  is  easy  to  grasp  and  it  comprises  the  majority  of  this  thesis,  so  we  momentarily 
dwell  on  it.  Consider  three  nodes  in  a  network,  which  we  name  Xander,  Yvaine,  and 
Zoe  for  convenience.  Let  X  and  Y  be  two  correlated  random  variables  with  joint 
distribution  p(x,y).  Suppose  Xander  and  Yvaine  observe  X  and  Y  respectively, 
and  both  independently  report  their  observation  to  Zoe.  One  or  both  of  them 
may  be  a  traitor;  i.e.  taking  instructions  from  an  adversary,  so  their  transmissions 
to  Zoe  could  be  incorrect.  From  her  received  information,  Zoe  can  estimate  the 
empirical  joint  distribution  of  X  and  Y,  which  we  denote  q(x,y).  Since  one  of 
Xander  and  Yvaine  may  not  be  trustworthy,  q(x,y )  could  differ  from  the  true 
empirical  distribution.  However,  if  both  Xander  and  Yvaine  were  honest,  then 
Zoe  can  expect  q(x,y )  to  be  close — or  exactly  equal  to — p(x,y).  Therefore,  if  q 
is  not  close  to  p,  then  Zoe  can  conclude  that  one  of  her  friends  must  be  lying. 
Note  that  Zoe  may  not  be  able  to  tell  which  person  has  done  so,  but  now  both 
Xander  and  Yvaine  are  suspect,  which  means  that  if  Zoe  can  gather  information 
from  other  nodes,  those  nodes  might  be  more  reliable,  assuming  the  adversary  has 
influence  over  a  limited  number  of  nodes.  Consider  the  situation  also  from  the 
adversary’s  perspective.  If  Xander  is  a  traitor,  he  has  two  choices  in  what  he  tells 


14 


31 


Zoe.  He  could  report  a  value  for  X  that  will  cause  q  to  be  close  to  p,  or  not.  If 
the  former,  then  he  is  constrained  in  his  choice  for  what  he  tells  Zoe,  which  means 
he  has  reduced  ability  to  cause  damage.  If  the  latter,  he  partially  gives  away  his 
position.  The  key  in  designing  strategies  to  defeat  adversaries  is  to  allow  checks 
to  be  made,  like  the  one  Zoe  made  by  comparing  q  to  p.  The  more  checks,  the 
more  rock-or-hard-place  decisions  the  adversary  must  make,  thereby  diminishing 
its  influence. 

The  main  building  block  of  the  Polytope  Code  is  special  probability  distri¬ 
butions  over  polytopes  in  real  vector  fields.  These  distribution  produce  random 
variables  like  X  and  Y  that  are  sent  through  the  network.  Their  empirical  dis¬ 
tributions  are  compared  at  internal  nodes  in  the  network,  just  as  Zoe  does.  This 
allows  for  error  detection  inside  the  network.  The  special  polytope  structure  over 
the  real  vector  held  allows  for  the  internal  comparisons  to  be  particular  effective, 
in  a  way  that  would  not  occur  with  probability  distributions  over  a  finite  held. 


1.4  Multiterminal  Source  Coding 

1.4.1  Related  Work 

Multiterminal  source  coding  was  introduced  by  Slcpian  and  Wolf  in  [40].  They 
considered  the  situation  that  two  separate  encoders  observe  correlated  random 
variables,  and  each  independently  transmit  encoded  versions  of  their  observations 
to  a  common  decoder,  which  attempts  to  recover  the  sources  exactly,  with  small 
probability  of  error.  They  found  the  remarkable  result  that  the  sum-rate — the  total 
communication  rate  from  both  encoders  to  the  decoder — can  be  made  as  small  as 
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if  a  single  encoder  could  observe  both  sources  simultaneously  and  compress  them 
jointly.  A  proof  of  the  same  result  of  the  same  result  was  given  in  [41].  This 
paper  used  the  technique  of  random  binning ,  whereby  a  random  ensemble  of  codes 
is  created  by  placing  each  possible  observed  source  sequence  into  bins  uniformly 
at  random.  Given  a  particular  binning,  the  encoding  process  consists  simply  of 
transmitting  to  the  encoder  the  index  of  the  bin  containing  the  observed  source 
sequence.  With  high  probability,  the  codes  created  by  this  process  are  good. 

The  notion  of  source  coding  with  side  information  was  studied  in  [42,  43]. 
These  papers  considered  the  same  setup  as  the  Slepian-Wolf  problem,  except  now 
the  decoder  is  interested  only  in  recovering  one  of  the  two  sources.  The  other 
source  and  the  associated  encoder  provides  only  so-called  side  information,  since 
it  is  used  only  to  help  recover  the  target  source.  The  description  of  the  achievable 
rate  region  for  this  problem  required  the  use  of  an  auxiliary  random  variable,  which 
represents  a  quantized  or  degraded  version  of  the  side  information. 

A  similar  problem  in  the  rate-distortion  framework  was  studied  in  [44],  Here, 
the  decoder  has  complete  side  information  (i.e.  uncoded),  and  wishes  to  recover  a 
target  source,  but  it  may  accept  some  degradation  in  its  source  estimate,  as  long 
as  the  estimate  satisfies  a  distortion  constraint.  The  solution  of  this  problem  gives 
the  trade-off  between  communication  rate  from  the  encoder,  and  distortion  of  the 
source  estimate  produced  at  the  decoder. 

All  the  above  problems  involved  at  most  two  sources,  but  achievable  strate¬ 
gies  used  to  solve  them  naturally  generalize  to  many  sources,  many  encoders,  and 
many  distortion  constraints.  This  general  achievable  scheme  is  sometimes  known 
as  the  Berger- Tung  achievable  scheme  [45,  46].  Another  common  term  for  it — 
and  perhaps  more  descriptive — as  quantize-and-bin.  The  idea  is  that  each  encoder 
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quantizes  its  measured  source,  in  a  manner  prescribed  by  an  auxiliary  random 
variable  along  the  lines  of  [42,  43];  then,  the  encoders  use  random  binning,  exactly 
following  the  proof  of  the  Slepian-Wolf  result  in  [41].  The  achievable  rate-distortion 
region  given  by  this  strategy  has  a  very  intuitive  form,  but  unfortunately  it  is  not 
always  optimal  for  multiterminal  source  coding  problems.  In  [47],  Korner  and  Mar- 
ton  provide  a  surprisingly  simple  example  for  which  an  achievable  strategy  strictly 
better  than  Berger- Tung  exists.  Despite  considerable  effort,  the  most  general  form 
of  the  problem  remains  unsolved,  even  for  two  sources.  Still,  a  steadily  growing 
number  of  special  cases  have  been  solved. 

One  such  special  case  of  the  multiterminal  source  coding  problem  which  ad¬ 
ditional  structure  is  known  as  the  CEO  Problem.  It  was  introduced  for  discrete 
memoryless  sources  in  [48].  In  the  CEO  Problem  (so-named  because  the  decoder 
represents  a  company’s  CEO  that  has  supposedly  dispatched  his  or  her  employees 
as  encoders  to  gather  data  and  report  back),  the  decoder  is  interested  in  recover¬ 
ing  a  single  source  with  some  distortion,  but  this  source  is  not  directly  observed 
by  any  encoder.  Instead,  the  encoders  observe  noisy  versions  of  the  source,  such 
that  the  noise  for  each  encoder  is  conditionally  independent  given  the  source.  This 
conditional  independence  structure  of  the  sources  comprises  a  clean  structure  that 
appears  to  make  the  problem  more  tractable.  In  [48],  it  was  found  that  with  a  large 
number  of  encoders  each  observing  the  source  through  the  same  noisy  channel,  the 
distortion  of  the  estimate  found  at  the  decoder  falls  exponentially  fast  with  the 
sum-rate  from  all  the  sources.  Moreover,  they  exactly  characterize  the  optimal 
error  exponent.  Again,  the  achievable  strategy  used  is  Berger-Tung. 

A  significant  sub-class  of  multiterminal  source  coding  is  the  quadratic  Gaussian 
setup.  Here,  sources  are  Gaussian  and  distortion  constraints  are  quadratic.  These 
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assumptions  tend  to  make  problems  more  tractable  and  allow  the  use  of  powerful 
tools,  such  as  the  entropy  power  inequality,  which  was  originally  stated  by  Shannon 
[49]  and  proved  in  [50,  51].  For  example,  the  complete  rate-distortion  region  for 
the  two-terminal  source  coding  problem  in  the  quadratic  Gaussian  setup  was  found 
in  [52], 

The  quadratic  Gaussian  CEO  Problem  was  introduced  in  [53].  In  a  result  along 
the  lines  of  that  of  [48]  it  is  shown  that  with  many  encoders  measuring  a  noisy 
version  of  the  source  with  identical  noise  variance,  the  achievable  distortion  falls 
asymptotically  with  the  sum-rate  like  K/R ,  where  R  is  the  sum  rate  and  K  is 
a  constant  depending  only  on  the  source  characteristics.  Moreover,  they  exactly 
characterize  K .  The  exact  rate-distortion  function  for  finite  sum-rate  was  found 
in  [54],  The  rate-distortion  region  for  a  finite  number  of  sources  and  nonidentical 
encoder  measurements  was  discovered  simultaneously  in  [55]  and  [56].  All  these 
results  again  use  only  the  Berger- Tung  strategy  to  prove  achievability.  The  con¬ 
verse  arguments  make  heavy  use  of  the  entropy  power  inequality,  and  follow  the 
essential  argument  first  proposed  in  [54],  which  is  also  based  partially  on  [57]. 

There  is  a  modest  amount  of  work  in  the  literature  on  source  coding  under 
adversarial  attack.  Perhaps  the  closest  commonly-studied  relative  is  the  multiple 
descriptions  problem,  introduced  with  early  work  in  [58,  59,  60].  The  problem 
here  is  that  two  encoders  observe  a  single  source.  They  must  each  independently 
transmit  encoded  versions  to  a  common  decoder.  However,  the  transmissions  may 
fail  to  arrive,  so  they  should  be  designed  so  that  each  one  leads  to  a  quality 
estimate,  but  if  both  arrive,  an  even  better  estimate  can  be  produced.  This  problem 
has  elements  of  the  idea  of  an  attack  on  source  coding:  each  encoder  need  to  be 
designed  for  the  possibility  that  other  encoders  may  fail.  A  significant  general 
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achievability  result  was  given  in  [61].  This  strategy  has  been  shown  to  be  optimal 
in  the  case  that  there  is  no  excess  rate  [62]  and  the  quadratic  Gaussian  case  [57]. 
More  recently,  [63]  studied  so-called  robust  distributed  source  coding.  The  problem 
there  was  somewhat  closer  to  ours:  it  is  in  effect  a  combination  of  the  multiple 
descriptions  problem  and  the  CEO  problem.  Nodes  observe  noisy  versions  of  the 
source,  and  must  encode  these  sources  in  such  a  way  that  the  more  arrive,  the 
better  the  decoder’s  estimate. 

Prior  versions  of  the  work  presented  in  Chapter  3  and  4  of  this  thesis  has 
appeared  in  [64,  65,  66,  67,  68]. 

1.4.2  Contributions 

In  Chapter  3  we  consider  the  Slepian-Wolf  problem,  and  in  Chapter  4  the  CEO 
problem,  both  under  adversarial  attack.  For  the  Slepian-Wolf  problem — wherein 
the  decoder  seeks  to  exactly  recover  all  sources  with  small  probability  of  error — we 
exactly  characterize  the  achievable  rate  regions  for  three  setups: 

1.  A  variable-rate  model,  in  which  the  decoder  can  in  real-time  allocate  trans¬ 
mission  rate  to  the  encoders.  Here,  we  place  a  guarantee  on  the  sum-rate 
that  will  be  achieved,  but  cannot  promise  exactly  how  this  rate  is  allocated, 
because  it  depends  on  the  actions  of  the  adversary. 

2.  A  randomized  fixed-rate  model,  in  which  the  rate  for  each  encoder  is  fixed 
beforehand,  but  the  encoders  have  private  randomness  that  is  hidden  from 
the  adversary. 

3.  A  deterministic  fixed-rate  model,  in  which  the  encoders  do  not  have  private 
randomness.  This  is  the  most  pessimistic  model,  but  therefore  the  most 
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robust  against  powerful  adversaries.  Moreover,  this  model  most  closely  cor¬ 
responds  to  the  model  used  in  other  chapters  of  this  thesis. 

For  all  these  models,  we  allow  a  very  general  model  of  the  information  known  to 
the  adversary.  In  particular,  we  assume  the  adversary  has  access  to  the  output 
of  an  arbitrary  noisy  channel,  which  takes  as  input  the  sources  observed  by  the 
encoders.  This  model  allows  for  an  adversary  that  knows  nothing,  an  adversary 
that  knows  everything,  or  any  in  between.  We  also  allow  for  a  very  general  view 
of  what  the  decoder  knows  about  which  nodes  the  adversary  may  control  as  well 
as  what  information  the  adversary  has  access  to. 

Our  achievable  strategies  for  the  Slepian-Wolf  problem  are  generalizations  of 
the  random  binning  approach  of  [41],  The  variable-rate  achievable  strategy  for  the 
first  setup  is  the  most  substantially  different,  in  that  it  involves  numerous  small 
messages  being  sent  between  encoders  and  the  decoder.  After  each  message  the 
decoder  chooses  which  encoder  to  hear  from  next,  thereby  allocating  rate  in  real 
time. 

One  peculiarity  about  the  Slepian-Wolf  problem  in  the  presence  of  an  adver¬ 
sary  is  that  it  is  not  reasonable  to  expect  the  decoder  to  recover  all  the  sources 
exactly,  as  we  can  without  an  adversary.  This  is  because  an  adversarial  node  may 
simply  choose  not  to  transmit  any  useful  information  about  its  associated  source. 
Moreover,  it  may  not  be  possible  for  the  decoder  to  learn  exactly  which  nodes  are 
the  traitors.  We  therefore  require  only  that  the  estimates  produced  by  the  decoder 
are  accurate  only  for  honest  nodes,  even  if  it  does  not  know  which  ones  those  are. 
This  allows  the  decoder  to  place  a  guarantee  on  the  number  of  correct  estimates 
that  it  produces,  but  it  means  that  the  estimates  are  arguably  not  useful  without 
post-processing. 
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This  inherent  difficulty  with  the  Slepian-Wolf  problem  motivates  our  study 
of  the  CEO  problem  in  chapter  4.  The  advantage  of  this  problem  is  that  no 
single  node  has  a  monopoly  on  any  information  about  the  target  source,  so  we 
can  guarantee  quality  of  the  source  estimate  at  the  decoder  in  all  cases.  We  study 
this  problem  in  both  the  discrete  memoryless  case,  for  which  we  generalize  the 
results  of  [48],  as  well  the  quadratic  Gaussian  case,  for  which  we  generalize  the 
results  of  [55,  56].  For  the  discrete  memoryless  problem,  we  present  upper  and 
lower  bounds  on  the  sum-rate  error  exponent  for  many  encoders  with  statistically 
identical  observations.  For  the  quadratic  Gaussian  problem,  we  present  inner  and 
outer  bounds  on  the  rate-distortion  region  for  a  finite  number  of  encoders  with 
nonuniform  measurements. 

For  the  CEO  problem,  we  focus  only  on  the  most  pessimistic  model,  corre¬ 
sponding  to  the  deterministic  fixed-rate  model  discussed  above  for  the  Slepian- 
Wolf  problem,  and  assuming  the  adversary  is  omniscient.  Our  achievable  results 
are  derived  from  a  unified  achievable  scheme  for  both  the  discrete  memoryless  and 
quadratic  Gaussian  problems.  Our  achievable  scheme  for  the  adversarial  problem 
is  a  generalization  of  the  non- adversarial  Berger-Tung  strategy,  and  can  be  ap¬ 
plied  to  a  similarly  general  form  of  the  problem.  Our  outer  bounds  are  based  on 
a  specific  type  of  attack  by  the  adversary,  which  can  be  viewed  as  a  form  of  the 
Singleton  bound  [39]. 
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1.5  Power  System  Sensing  and  Estimation 

1.5.1  Related  Work 

Power  system  state  estimation  was  introduced  by  Schweppe,  Wildes,  and  Rom  in 
[69].  State  estimation  took  as  input  measurements  of  power  flows  taken  in  the 
power  system  and  produced  an  estimate  of  the  voltages  and  phases  on  all  busses  in 
the  system.  Ever  since  this  first  introduction  of  state  estimation,  it  has  been  neces¬ 
sary  to  deal  with  bad  data.  Traditionally,  bad  data  were  assumed  to  be  caused  by 
random  errors  resulting  from  a  fault  in  a  meter  and/or  its  attendant  communica¬ 
tion  system.  These  errors  are  modeled  by  a  change  of  variance  in  Gaussian  noise, 
which  leads  to  an  energy  (/2)  detector  (see  [70,  71,  72,  73,  74]).  Another  classical 
detector  as  the  so-called  largest  normalized  residue  (LNR)  detector  [69,  70],  which 
has  the  form  of  a  test  on  the  l ^  norm  of  the  normalized  measurement  residual. 

Observability  is  an  important  consideration  when  measuring  the  system  state. 
A  system  is  observable  only  if  there  are  enough  meters  so  that  there  is  no  bus 
whose  voltage  could  change  without  having  an  effect  on  some  meter.  The  problem 
to  determine  whether  the  system  is  observable  has  been  studied  in  [75,  76].  In  [77], 
a  purely  topological  condition  for  observability  was  given. 

Recently,  Liu,  Ning,  and  Reiter  studied  the  problem  that  several  meters  are 
seized  by  an  adversary  that  is  able  to  corrupt  the  measurements  from  those  meters 
as  received  by  the  control  center  [78].  This  differs  from  previous  investigations  of 
the  problem  in  that  the  false  data  at  various  meters  can  be  simultaneously  crafted 
by  the  adversary  to  defeat  the  state  estimator,  as  opposed  to  independent  errors 
caused  by  random  faults.  It  is  observed  in  [78]  that  there  exist  cooperative  and 


22 


39 


malicious  attacks  on  meters  that  all  known  bad  data  techniques  will  fail  to  detect. 
The  authors  of  [78]  gave  a  method  to  adjust  measurements  at  just  a  few  meters  in 
the  grid  in  such  a  way  that  bad  data  detector  will  fail  to  perceive  the  corruption 
of  the  data. 

Another  recent  work  that  is  similar  to  ours  is  by  Gorinevsky,  Boyd,  and  Poll 
[79].  They  study  attempt  to  find  a  small  number  of  faults  in  a  power  system  by 
formulating  a  convex  problem  that  is  likely  to  lead  to  a  sparse  solution.  Their 
work  is  partially  inspired  by  the  recent  development  of  compressed  sensing  and  l\ 
minimization  techniques  [80] .  In  their  problem,  the  desired  sparsity  has  to  do  with 
the  small  number  of  faults  they  expect  in  the  problem.  In  our  work  on  adversarial 
attacks,  we  expect  a  small  number  of  adversaries  in  the  network;  therefore,  a 
similar  approach  is  applicable. 

Prior  versions  of  our  work  on  power  system  sensing  in  the  presence  of  adver¬ 
saries  have  appeared  in  [81,  82,  83]. 

1.5.2  Contributions 

In  Chapter  5,  we  present  several  results  extending  the  work  of  [78].  We  note  that 
the  observation  made  therein  can  be  made  even  stronger:  if  an  adversary  has  the 
ability  to  adjust  the  measurements  from  enough  meters,  then  no  algorithm  at  the 
control  center  will  ever  be  able  to  detect  that  an  adjustment  has  been  made.  This 
can  be  viewed  as  a  fundamental  limit  on  the  ability  of  the  classical  formulation 
of  state  estimation  to  handle  cooperative  attacks.  We  also  show  that  there  is  a 
close  relationship  between  the  attacks  described  in  [78]  and  system  observability. 
For  this  reason,  we  refer  to  the  attacks  of  [78]  as  unobservable  attacks.  This 
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relationship  allows  us  to  extend  the  topological  results  of  [77]  to  give  an  efficient 
algorithm  to  calculate  attacks  of  this  nature  require  a  small  number  of  adversarial 
meters.  Our  algorithm  is  based  on  the  special  structure  of  the  power  system,  and 
makes  use  of  techniques  to  efficiently  minimize  submodular  functions  [84,  85,  86]. 
Our  algorithms  allows  an  operator  of  a  power  system  to  find  the  places  in  which 
it  is  most  vulnerable  to  these  attacks. 

Unobservable  attacks  may  be  executed  by  the  adversary  only  if  it  controls 
enough  meters.  We  also  study  the  problem  in  the  regime  that  it  is  not  able  to 
perform  this  attack.  Here,  we  develop  a  heuristic  that  allows  us  to  find  attacks 
that  minimize  the  energy  of  the  measurement  residual,  and  therefore  are  likely  to 
cause  the  most  damage.  We  also  present  a  decision  theoretic  formulation  in  which 
the  control  center  attempts  detect  malicious  data  injections  by  an  adversary.  The 
adversary  has  the  freedom  to  choose  which  meters  it  takes  control  of,  and  what 
sort  of  attack  it  performs;  therefore,  this  detection  problem  cannot  be  formulated 
as  a  simple  hypothesis  test,  and  the  uniformly  most  powerful  test  may  not  exist. 
We  study  the  generalized  likelihood  ratio  test  (GLRT)  for  this  problem.  The 
GLRT  is  not  optimal  in  general,  but  it  is  known  to  perform  well  in  practice  and 
its  performance  has  shown  to  be  close  to  optimal  when  the  detector  has  access 
to  a  large  number  of  data  samples  [87,  88,  89].  We  also  find  that  when  there  is 
only  a  single  meter  controlled  by  the  adversary,  the  GLRT  is  identical  to  the  LNR 
detector  [70],  which  provides  some  theoretical  underpinning  to  this  already-in-use 
test. 

For  large  systems  and  possible  many  adversaries,  it  is  not  feasibly  to  implement 
the  exact  GLRT.  Instead,  we  study  a  convex  relaxation  based  on  the  l\  norm, 
which  is  likely  to  produce  sparse  solutions.  We  perform  numerical  simulations  that 
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demonstrate  that  the  GLRT  and  its  convex  relaxation  both  outperform  traditional 
detectors. 


1.6  Organization 

Chapter  2:  We  introduce  node-based  adversarial  attacks  on  network  coding.  We 
give  our  upper  bound  on  achievable  rates.  We  proceed  to  introduce  Polytope  Codes 
through  several  examples,  culminating  in  the  general  theory  and  the  fundamental 
properties.  Then  we  prove  that  Polytope  Codes  achieve  the  capacity  for  a  class  of 
planar  networks.  Finally,  we  provide  an  example  with  capacity  strictly  less  than 
the  cut-set  bound. 

Chapter  3:  We  study  the  Slepian-Wolf  problem  with  adversarial  nodes.  We 
present  our  model,  then  give  a  simple  example  illustrating  it  and  our  basic  tech¬ 
nique.  We  go  on  to  find  the  exact  achievable  rate  region  for  the  three  cases  de¬ 
scribed  above:  variable  rate,  randomized  fixed-rate,  and  deterministic  fixed-rate. 

Chapter  4:  We  investigate  the  CEO  problem  under  adversarial  attack,  for 
both  the  discrete  memoryless  case  and  the  quadratic  Gaussian  case.  We  present 
our  unified  achievable  Berger- Tung- like  achievable  scheme.  We  apply  it  to  calculate 
bounds  on  the  achievable  error  exponent  for  the  discrete  memoryless  case  and  the 
rate  region  for  the  quadratic  Gaussian  case.  Then  we  find  outer  bounds  for  both 
cases. 

Chapter  5:  We  present  our  work  on  power  system  sensing  and  estimation,  in 
the  presence  of  malicious  attacks  on  meters.  We  describe  unobservable  attacks, 
and  prove  the  relationship  between  them  and  system  observability.  We  go  on  to 
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use  this  to  find  an  efficient  algorithm  to  find  these  attacks,  and  show  that  it  is 
able  to  find  optimal  attacks.  We  present  a  Bayesian  formulation  of  the  problem, 
which  we  argue  has  some  advantages  as  compared  with  the  traditional  model. 
We  give  a  decision  theoretic  framework  for  the  problem,  and  find  the  find  the 
generalized  likelihood  ratio  test  that  results  from  it.  We  perform  some  numerical 
simulations  on  various  detectors  for  these  problem,  including  the  GLRT  and  its 
convex  relaxation. 

Chapter  6:  We  offer  some  concluding  remarks  and  thoughts  on  future  direc¬ 
tions. 
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CHAPTER  2 

NODE-BASED  ATTACKS  ON  NETWORK  CODING  AND 

POLYTOPE  CODES 


2.1  Introduction 

This  chapter  studies  network  coding  in  a  network  with  one  source  and  one  desti¬ 
nation  when  any  s  nodes  may  be  controlled  by  an  adversary.  These  node-based 
attacks  differ  from  the  edge-based  attacks  first  considered  in  [6,  7].  There,  the 
adversary  can  control  any  z  unit-capacity  links.  In  [6,  7],  it  is  shown  that  the  ca¬ 
pacity  with  z  adversarial  links  is  exactly  2 z  less  than  the  min- cut  of  the  network, 
which  is  the  capacity  with  no  adversary  present.  The  precise  result  is  quoted  in 
Sec.  2.4. 

Defeating  node-based  attacks  is  fundamentally  different  from  defeating  edge- 
based  attacks.  First,  the  edge  problem  does  not  immediately  solve  the  node  prob¬ 
lem.  Consider,  for  example,  the  Cockroach  network,  shown  in  Fig.  2.1.  Suppose  we 
wish  to  handle  any  single  adversarial  node  in  the  network.  One  simple  approach 
would  be  to  apply  to  edge  result  from  [6,  7]:  no  node  controls  more  than  two  unit- 
capacity  edges,  so  we  can  defeat  the  node-based  attack  by  using  a  code  that  can 
handle  an  attack  on  any  two  edges.  However,  note  that  the  achievable  rate  for  this 
network  without  an  adversary  is  4,  so  subtracting  twice  the  number  of  bad  edges 
leaves  us  with  an  achievable  rate  of  0.  As  we  will  show,  the  actual  capacity  of  the 
Cockroach  network  with  one  traitor  node  is  2.  Relaxing  the  node  attack  problem 
to  the  edge  attack  problem  is  too  pessimistic,  and  we  can  do  better  if  we  treat  the 
node  problem  differently. 
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Node-based  attacks  and  edge-based  attacks  differ  in  an  even  more  significant 
way.  When  the  adversary  can  control  any  set  of  z  unit-capacity  edges,  it  is  clear 
that  it  should  always  take  over  the  edges  on  the  minimum  cut  of  the  network. 
However,  if  the  adversary  can  control  any  set  of  s  nodes,  it  is  not  so  obvious:  one 
node  may  have  many  more  output  edges  than  another,  so  depending  on  which 
nodes  the  adversary  takes  over,  it  may  control  various  numbers  of  edges.  It  may 
face  a  choice  between  a  node  directly  on  the  min-cut,  but  with  few  output  edges, 
and  a  node  away  from  the  min-cut,  but  with  many  output  edges.  For  example,  in 
the  Cockroach  network,  node  4  has  only  one  output  edge,  but  it  is  on  the  min-cut 
(which  is  between  nodes  S,  1,  2,  3, 4,  5  and  D);  node  1  has  two  output  edges,  so  it 
is  apparently  more  powerful,  but  it  is  also  one  step  removed  from  the  min-cut,  and 
therefore  its  ability  to  influence  the  destination  may  be  limited.  This  uncertainty 
about  where  a  network  is  most  vulnerable  seems  to  make  the  problem  hard.  Indeed, 
we  find  that  linear  network  coding  techniques  fail  to  achieve  capacity,  so  we  resort 
to  nonlinear  codes,  and  in  particular  Polytope  Codes,  to  be  described.  We  further 
discuss  the  relationship  between  the  edge  problem  and  the  node  problem  in  Sec.  2.3, 
in  which  we  show  that  the  edge  problem  is  subsumed  by  the  node  problem. 

Many  achievability  results  in  network  coding  have  been  proved  using  linear 
codes  over  a  finite  field.  In  this  chapter  we  demonstrate  that  linear  codes  are 
insufficient  for  this  problem.  Moreover,  we  develop  a  class  of  codes  called  Polytope 
Codes,  originally  introduced  in  [35]  under  the  less  descriptive  term  “bounded-linear 
codes” .  Polytope  codes  are  used  to  prove  that  a  cut-set  bound,  stated  and  proved 
in  Sec.  2.4,  is  tight  for  a  certain  class  of  networks.  Polytope  Codes  differ  from 
linear  codes  in  three  ways: 


1.  Comparisons:  A  significant  tool  we  use  to  defeat  the  adversary  is  for  internal 
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Figure  2.1:  The  Cockroach  Network.  All  edges  have  capacity  1.  With  a  single 
traitor  node,  the  capacity  is  2,  but  no  linear  code  can  achieve  a  rate  higher  than 
4/3.  A  proof  of  the  linear  capacity  is  given  in  Sec.  2.13.  A  capacity- achieving 
linear  code  supplemented  by  nonlinear  comparisons  is  given  in  Sec.  2.6,  and  a 
capacity- achieving  Polytope  Code  is  given  in  Sec.  2.8. 

nodes  in  the  network  to  perform  comparisons:  they  check  whether  their 
received  data  could  have  occurred  if  all  nodes  had  been  honest.  If  not,  then 
a  traitor  must  have  altered  one  of  the  received  values,  in  which  case  it  can 
be  localized.  The  result  of  the  comparison,  a  bit  representing  whether  or 
not  it  succeeded,  can  be  transmitted  downstream  through  the  network.  The 
destination  receives  these  comparison  bits  and  uses  them  to  determine  who 
may  be  the  traitors,  and  how  to  decode.  These  comparison  operations  are 
nonlinear,  and,  as  we  will  demonstrate  in  Sec.  2.6,  incorporating  them  into  a 
standard  finite-field  linear  code  can  increase  achieved  rate.  However,  even  a 
code  composed  of  a  linear  code  supplemented  by  these  nonlinear  comparison 
operations  is  insufficient  to  achieve  capacity  for  some  networks;  Polytope 
Codes  also  incorporate  comparisons,  but  of  a  more  sophisticated  variety. 

2.  Joint  Type  Codebooks  via  Probability  Distributions:  Unlike  usual  linear  net¬ 
work  codes,  Polytope  Codes  make  use  of  probability  distributions.  In  many 
ways  they  are  more  like  random  codes,  such  as  those  used  in  the  standard 
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proof  of  Shannon’s  channel  coding  theorem,  but  they  differ  from  these  as 
well.  Each  Polytope  Code  is  governed  by  a  joint  probability  distribution 
on  a  set  of  random  variables,  one  for  each  edge  in  the  network.  Given  the 
distribution,  codewords  are  selected  to  be  sequences  with  joint  type  exactly 
equal  to  the  distribution.  Contrast  this  with  randomly  generated  codewords, 
which  would,  with  high  probability,  have  joint  type  close  to  the  base  distribu¬ 
tion.  Here  we  use  an  entirely  deterministic  process  to  generate  the  codebook: 
we  simply  list  all  sequences  with  type  equal  to  the  given  distribution,  and 
associate  each  one  with  a  message.  The  advantage  of  this  method  of  code 
construction  is  that  an  internal  node  will  know  exactly  what  joint  type  to  ex¬ 
pect  of  its  received  sequences,  because  it  knows  the  original  distribution.  The 
comparisons  discussed  above  consist  of  checking  whether  the  observed  joint 
type  matches  the  expected  distribution.  If  it  does  not,  then  the  adversary 
must  have  influenced  one  of  the  received  sequences,  so  it  can  be  localized. 

3.  Distributions  over  Poly  topes:  The  final  difference  between  classical  error  con¬ 
trol  codes  and  Polytope  Codes — and  the  one  for  which  the  latter  are  named — 
comes  from  the  nature  of  the  probability  distributions  discussed  above.  These 
distributions  are  uniform  over  the  set  of  integer  lattice  points  on  polytopes 
in  real  vector  fields.  This  choice  for  distribution  provides  two  useful  proper¬ 
ties.  First,  the  entropy  vector  for  these  distributions  can  be  easily  calculated 
merely  from  properties  of  the  linear  space  in  which  the  polytope  sits.  In  this 
sense,  they  share  characteristics  with  finite-field  linear  codes.  In  fact,  a  Poly¬ 
tope  Code  can  almost  always  be  used  in  place  of  a  linear  code.  The  second 
useful  property  has  to  do  with  how  the  comparisons  inside  the  network  are 
used.  The  distributions  over  polytopes  are  such  that  if  enough  comparisons 
succeed,  the  adversary  is  forced  to  act  as  an  honest  node  and  transmit  cor- 
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rect  information.  This  property  will  be  elaborated  in  examples  in  Sec.  2.7 
and  Sec.  2.8,  as  well  as  stated  in  its  most  general  form  in  Sec.  2.9. 

Our  main  result,  that  the  cut-set  bound  can  be  achieved  using  Polytope  Codes 
for  a  class  of  planar  networks,  is  stated  in  Sec.  2.5.  Planarity  requires  that  the 
graph  can  be  embedded  in  a  plane  such  that  intersections  between  edges  occur 
only  at  nodes.  This  ensures  that  enough  opportunities  for  comparisons  are  avail¬ 
able,  allowing  the  code  to  more  well  defeat  adversarialy  attacks.  Before  proving 
the  result  in  Sec.  2.10,  we  develop  the  theory  of  Polytope  Codes  through  several 
examples  in  Sec.  2.6,  2.7,  2.8;  we  also  discuss  some  general  properties  of  Polytope 
Codes  in  Sec.  2.9. 

In  Sec.  2.11-2.13,  we  provide  some  additional  comments  on  this  problem. 
Sec.  2.11  shows  that  the  cut-set  bound  is  not  always  tight,  by  giving  an  exam¬ 
ple  with  a  tighter  bound.  Sec.  2.12  includes  a  tighter  version  of  the  cut-set  bound 
than  that  stated  in  Sec.  2.4,  along  with  an  illustrating  example  of  the  need  for  a 
more  general  bound.  Sec.  2.13  provides  a  proof  that  linear  codes  are  insufficient 
for  the  Cockroach  network. 


2.2  Problem  Formulation 

Let  (V,  E )  be  an  directed  acyclic  graph.  We  assume  all  edges  are  unit-capacity, 
and  there  may  be  more  than  one  edge  connected  the  same  pair  of  nodes.  One  node 
in  V  is  denoted  S,  the  source,  and  one  is  denoted  D,  the  destination.  We  wish 
to  determine  the  maximum  achievable  throughput  from  S  to  D  when  any  set  of  s 
nodes  in  V  \  {S',  D}  are  traitors ;  i.e.  they  are  controlled  by  the  adversary.  Given 
a  rate  R  and  a  block-length  n,  the  message  W  is  chosen  at  random  from  the  set 
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{1, . . . ,  2nR}.  Because  each  edge  is  unit  capacity,  it  holds  a  value  Xe  e  {1, . . . ,  2n}. 

A  code  is  be  made  up  of  three  components: 

1.  an  encoding  function  at  the  source,  which  produces  values  to  place  on  all  the 
output  edges  given  the  message, 

2.  a  coding  function  at  each  internal  node  i  G  {.S',  D},  which  produces  values 
to  place  on  all  output  edges  from  i  given  the  values  on  all  input  edges  to  i, 

3.  and  a  decoding  function  at  the  destination,  which  produces  an  estimate  W 
of  the  message  given  the  values  on  all  input  edges. 

Suppose  T  C  V  \{S,D}  with  \T\  —  s  is  the  set  of  traitors.  They  may  subvert 
the  coding  functions  at  nodes  i  €  T  by  placing  arbitrary  values  on  all  the  output 
edges  from  these  nodes.  Let  Zt  be  the  set  of  values  on  these  edges.  For  a  particular 
code,  specifying  the  message  W  as  well  as  ZT  determines  exactly  the  values  on  all 
edges  in  the  network,  in  addition  to  the  destination’s  estimate  W.  We  say  that  a 
rate  R  is  achievable  if  there  exists  a  code  operating  at  that  rate  with  some  block- 
length  n  such  that  for  all  messages,  all  sets  of  traitors  T,  and  all  values  of  Zt, 
W  =  W .  That  is,  the  destination  always  decodes  correctly  no  matter  what  the 
adversary  does.  Let  the  capacity  C  be  the  supremum  over  all  achievable  rates. 


2.3  Node  Problem  vs.  Edge  Problem 

The  first  major  work  on  network  coding  in  the  presence  of  adversaries,  [6,  7], 
studied  the  problem  in  which  a  fixed  number  of  unit-capacity  edges  are  controlled 
by  the  adversary.  A  more  general  form  of  the  problem,  in  which  the  adversary 


32 


49 


controls  a  fixed  number  of  edges  of  possibly  differing  capacities,  was  studied  in 
[37,  38].  We  argue  in  this  section  that  even  the  latter  problem  is  subsumed  by  the 
node  problem  studied  in  this  chapter.  In  fact,  we  prove  a  somewhat  stronger  fact, 
that  the  node  problem  is  equivalent  to  what  we  call  the  limited-node  problem. 

The  limited-node  problem  is  a  generalization  of  the  node  problem,  in  which  a 
special  subset  of  nodes  are  designated  as  potential  traitors,  and  the  code  must  only 
guard  against  adversarial  control  of  any  s  of  those  nodes.  Certainly  the  limited- 
node  problem  subsumes  the  all-node  problem,  since  we  may  simply  take  the  set 
of  potential  traitors  to  be  all  nodes.  Furthermore,  it  subsumes  the  unequal-edge 
problem  studied  in  [37,  38],  because  given  an  instance  of  the  unequal-edge  problem, 
an  equivalent  all-node  problem  can  be  constructed  as  follows:  create  a  new  network 
with  every  edge  replaced  by  a  pair  of  edges  of  equal  capacity  with  a  node  between 
them.  Then  limit  the  traitors  to  be  only  these  interior  nodes. 

We  now  show  that  the  all-node  problem  actually  subsumes  the  limited-node 
problem,  and  therefore  also  the  unequal-edge  problem.  In  Sec.  2.11,  we  construct 
an  instance  of  the  limited- node  problem  for  which  the  cut-set  bound  is  not  tight. 
Because  of  the  equivalence  of  these  two  problem  shown  in  this  section,  this  indicates 
that  for  even  the  all-node  problem,  the  cut-set  bound  is  not  tight  in  general. 

Let  (V,  E)  be  a  network  under  a  limited-node  adversarial  attack,  where  there 
may  be  at  most  s  traitors  constrained  to  be  in  U  C  V,  and  let  C  be  its  capacity. 
We  construct  a  sequence  of  all-node  problems,  such  that  finding  the  capacity  of 
these  problems  is  enough  to  find  that  of  the  original  limited-node  problem.  Let 
(p(M),g(M))  pe  a  network  as  follows.  First  make  M  copies  of  (V,E).  That  is, 
for  each  i  e  V,  put  . . . ,  into  V^M\  and  for  each  edge  in  E,  create  M 
copies  of  it  connected  the  equivalent  nodes,  each  with  the  same  capacity.  Then, 
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for  each  i  e  U,  merge  ,  i(-M>  into  a  single  node  ?'*,  transferring  all  edges  that 

were  previously  connected  to  any  of  to  i*.  Let  C(yM')  be  the  all-node 

capacity  of  (y(M\  E^)  with  s  traitors.  For  large  M,  this  network  will  be  such 
that  for  any  i  U,  a  traitor  taking  over  one  of  the  respective  nodes  is  almost 
useless  because  it  commands  such  a  small  fraction  of  the  information  flow  through 
the  network.  That  is,  we  may  almost  assume  that  the  traitors  will  only  ever  be 
nodes  in  U .  This  is  stated  explicitly  in  the  following  theorem. 


Theorem  1  For  any  M ,  is  related  to  C  by 


-C(M)  <C< 
M 


1  n(M) 
M  -  2s 


(2.1) 


Moreover, 


C=  lim  -C(M) 

M—¥ oo  M 


(2.2) 


and  if  can  be  computed  to  arbitrary  precision  for  any  M  in  finite  time,  then 
so  can  C . 


Proof:  We  first  show  that  <  C.  Take  any  code  on  ( V^M\E^M ))  achieving 

rate  R  when  any  s  nodes  may  be  traitors.  We  use  this  to  construct  a  code  on  (V,  E ), 
achieving  rate  R/M  when  any  s  nodes  in  U  may  be  traitors.  We  do  this  by  first 
increasing  the  block-length  by  a  factor  of  M,  but  maintaining  the  same  number 
of  messages,  thereby  reducing  the  achieved  rate  by  a  factor  of  M.  Now,  since 
each  edge  in  (V,  E)  corresponds  to  M  edges  in  we  may  piace  every 

value  transmitted  on  an  edge  in  the  ( 'y(M) ,e^m ))  code  to  be  transmitted  on  the 
equivalent  edge  in  the  (V,  E)  code.  That  is,  all  functions  executed  by  i^\  . . . ,  ifMi 
are  now  executed  by  i.  The  original  code  could  certainly  handle  any  s  traitor  nodes 
in  U.  Hence  the  new  code  can  handle  any  s  nodes  in  U,  since  the  actions  performed 
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by  these  nodes  have  not  changed  from  (V^M\  E (M^)  to  (V,  E).  Therefore,  the  new 
code  on  (V,  E)  achieving  rate  R/M  for  the  limited- node  problem. 

Now  we  show  that  C  <  Ml9  C^M\  Take  any  code  on  (V,  E)  achieving  rate 
R.  We  will  construct  a  code  on  (V^M\  E^)  achieving  rate  (M  —  2 s)R.  This 
direction  is  slightly  more  difficult  because  the  new  code  needs  to  handle  a  greater 
variety  of  traitors.  The  code  on  {V^M\  E^)  is  composed  of  an  outer  code  and 
M  copies  of  the  (V,  E)  code  running  in  parallel.  The  outer  code  is  a  (M,  M  —  2s) 
MDS  code  with  coded  output  values  w±, . . .  ,vjm-  These  values  form  the  messages 
for  the  inner  codes.  Since  we  use  an  MDS  code,  if  wi, . . . ,  %  are  reconstructed  at 
the  destination  such  that  no  more  than  s  are  corrupted,  the  errors  can  be  entirely 
corrected.  The  jtli  copy  of  the  (V,  E)  code  is  performed  by  i*  for  i  e  U,  and  by 
'd-b  for  i  ^  U .  That  is,  nodes  in  U  are  each  involved  in  all  M  copies  of  the  code, 
while  nodes  not  in  U  are  involved  in  only  one.  Because  the  (V,  E)  code  is  assumed 
to  defeat  any  attack  on  only  nodes  in  U,  if  for  some  j,  no  nodes  i(j>  for  i  ^  U 
are  traitors,  then  the  message  Wj  will  be  recovered  correctly  at  the  destination. 
Therefore,  one  of  the  Wj  could  be  corrupted  only  if  iS31  is  a  traitor  for  some  i  (f  U. 
Since  there  are  at  most  s  traitors,  at  most  of  the  w±, ,  wm  will  be  corrupted,  so 
the  outer  code  corrects  the  errors. 

From  (2.1),  (2.2)  is  immediate.  We  can  easily  identify  M  large  enough  to 
compute  C  to  any  desired  precision. 


□ 


35 


52 


2.4  Cut-Set  Upper  Bound 

It  is  shown  in  [6,  7]  that,  if  an  adversary  controls  z  unit-capacity  edges,  the  network 
coding  capacity  reduces  by  2 z.  This  is  a  special  case  of  a  more  general  principle: 
an  adversary-controlled  part  of  the  network  does  twice  as  much  damage  in  rate  as 
it  would  if  that  part  of  the  network  were  merely  removed.  In  particular,  the  fol¬ 
lowing  theorem,  proved  in  [6,  7],  gives  the  capacity  for  multicast  and  an  adversary 
controlling  z  unit-capacity  edges: 

Theorem  2  (Theorem  4  in  [6]  and  Theorem  4  in  [7])  In  a  multicast  prob¬ 
lem  with  source  S  and  destinations  D i, ,  Dk,  the  network  coding  capacity  with 
an  adversary  capable  of  controlling  any  z  unit-capacity  edges  is 

C  =  min  mincut(5;  Dk)  —  2 z.  (2.3) 

k 

Moreover,  the  capacity  can  be  achieved  using  linear  codes. 

The  doubling  effect  seen  in  (2.3)  is  for  the  same  reason  that,  in  a  classical  error 
correction  code,  the  Hamming  distance  between  codewords  must  be  at  least  twice 
the  number  of  errors  that  can  be  corrected;  this  is  the  Singleton  bound  [39].  We 
now  give  a  cut-set  upper  bound  for  node-based  adversaries  in  network  coding  that 
makes  this  explicit. 

A  cut  in  a  network  is  a  subset  of  nodes  A  C  V  containing  the  source  but  not 
the  destination.  The  cut-set  upper  bound  on  network  coding  without  adversaries 
is  the  sum  of  the  capacities  of  all  forward-facing  edges  [9];  that  is,  edges  (i,j)  with 
i  6  4  and  j  A.  All  backward  edges  are  ignored. 

In  the  adversarial  problem,  backward  edges  are  more  of  a  concern.  This  is 
because  the  argument  relies  on  values  on  certain  edges  crossing  the  cut  being  un- 


36 


53 


affected  by  changes  in  the  values  on  other  edges  crossing  the  cut.  This  is  not 
guaranteed  in  the  presence  of  a  backwards  edge.  We  give  an  example  of  the  com¬ 
plication  in  Sec.  2.12.  To  avoid  the  issue,  we  state  here  Theorem  3,  a  simplified 
cut-set  bound  that  applies  only  to  cuts  without  backward  edges.  This  bound  will 
be  enough  to  prove  our  main  result,  stated  in  Sec.  2.5,  giving  the  capacity  of  a 
certain  class  of  networks,  but  for  the  general  problem  Theorem  3  can  be  tightened. 
We  expand  on  the  issue  of  backwards  edges,  and  state  a  tighter  version  of  the  cut¬ 
set  bound  in  Sec.  2.12.  Unlike  the  problem  without  adversaries,  we  see  that  there 
is  not  necessarily  a  single  cut-set  bound.  Some  more  elaborate  cut-set  bounds  are 
found  in  [37,  38].  This  paper  studies  the  unequal-edge  problem,  but  the  bounds  can 
be  readily  applied  to  the  node  problem.  It  was  originally  conjectured  in  [37]  that 
even  the  best  cut-set  bound  is  not  tight  in  general.  In  Sec.  2.11,  we  demonstrate 
that  there  can  be  an  active  upper  bound  fundamentally  unlike  a  cut-set  bound. 
The  example  used  to  demonstrate  this,  though  it  is  a  node  adversary  problem,  can 
be  easily  modified  to  confirm  the  conjecture  stated  in  [37]. 

Theorem  3  Consider  a  cut  A  C  V  with  the  source  S  in  A,  the  destination  D  not 
in  A,  and  with  no  backward  edges;  that  is,  there  is  no  edge  ( i,j )  G  E  with  i  ^  A 
and  j  G  A.  If  there  are  s  traitor  nodes,  then  for  any  set  T  C  V  with  |Tj  =  2s,  the 
following  upper  bound  holds  on  the  capacity  of  the  network: 

C<\{(i,j)eE:ieA\T ,  j$A}\.  (2.4) 

Proof:  Divide  T  into  two  disjoint  sets  7\  and  T2  with  |Tj|  =  |T2|  =  s.  Let 
Ei  and  E2  be  the  sets  of  edges  out  of  nodes  in  7\  and  T2  respectively  that  cross 
the  cut;  that  is,  edges  (i,  j)  with  i  G  A  D  Tj  or  i  G  A  n  T2,  and  j  ^  A.  Let  E  be 
the  set  of  all  edges  crossing  the  cut  not  out  of  nodes  in  Tj  or  T2.  Observe  that  the 
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upper  bound  in  (2.4)  is  precisely  the  total  number  of  edges  in  E.  Since  there  are 
no  backwards  edges  for  the  cut  A,  the  values  on  edges  in  E  are  not  functions  of 
the  values  on  edges  in  E\  or  E2.  In  particular,  if  the  adversary  alters  a  value  on 
an  edge  in  Ei  or  E2,  it  cannot  change  the  values  in  E. 

Suppose  (2.4)  does  not  hold.  If  so,  there  would  exist  a  code  with  block-length 
n  achieving  a  rate  R  higher  than  the  right  hand  side  of  (2.4).  For  any  set  of  edges 
F  C  E,  for  this  code,  we  can  define  a  function 

XF  :  2nR  ->■  Y[  2"  (2.5) 

e&F 

such  that  for  a  message  w,  assuming  all  nodes  act  honestly,  the  values  on  edges 
in  F  is  given  by  XF(w).  Since  R  is  greater  than  \E\,  there  exist  two  messages  w\ 
and  w2  such  that  XE{w\)  =  XE(w2). 

We  demonstrate  that  it  is  possible  for  the  adversary  to  confuse  the  message  w  1 
with  w2.  Suppose  W\  were  the  true  message,  and  the  traitors  are  If.  The  traitors 
replace  the  messages  going  along  edges  in  E\  with  XEl  (w2).  If  there  are  edges 
out  of  nodes  in  T\  that  are  not  in  E\ — i.e.  they  do  not  cross  the  cut — the  traitors 
do  not  alter  the  messages  on  these  edges  from  what  would  be  sent  if  they  were 
honest.  Thus,  the  values  sent  along  edges  in  E  is  given  by  XE(w\).  Now  suppose 
w2  were  the  true  message,  and  the  traitors  are  T2.  They  now  replace  the  messages 
going  along  edges  in  E2  with  XE2(wi),  again  leaving  all  other  edges  alone,  meaning 
that  the  values  on  E  are  XE(w2)  =  XE{w\).  Note  that  in  both  these  cases,  the 
values  on  Ei  are  XEl(w2),  the  values  on  E2  are  XE2(wi),  and  the  values  on  E  are 
XE(wi).  This  comprises  all  edges  crossing  the  cut,  so  the  destination  receives  the 
same  values  under  each  case;  therefore  it  cannot  differentiate  w±  from  w2.  □ 

We  illustrate  the  use  of  Theorem  3  on  the  Cockroach  network,  reproduced  in 
Fig.  2.2,  with  a  single  adversary  node.  To  apply  the  bound,  we  choose  a  cut  A 
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Figure  2.2:  The  Cockroach  Network.  All  edges  have  capacity  1.  With  one  traitor, 
the  cut-set  bound  of  Theorem  3  gives  an  upper  bound  on  capacity  of  2  by  setting 
A  =  {S,  1,2,3}  and  T=  {1,2}. 

and  a  set  T  with  \T\  =  2s  =  2,  since  we  consider  a  single  traitor  node.  Take 
A  =  {A,  1,2,  3, 4,  5},  and  T  =  {1,4}.  Four  edges  cross  the  cut,  but  the  only  ones 
not  out  of  nodes  T  are  (3,  D)  and  (5,  D),  so  we  may  apply  Theorem  3  to  give  an 
upper  bound  on  capacity  of  2.  Alternatively,  we  could  take  A  =  {A,  1,  2,  3}  and 
T  =  {1,  2},  to  give  again  an  upper  bound  of  2.  Note  that  there  are  6  edges  crossing 
this  second  cut,  even  though  the  cut-set  bound  is  the  same.  It  is  not  hard  to  see 
that  2  is  the  smallest  upper  bound  given  by  Theorem  3  for  the  capacity  of  the 
Cockroach  network.  In  fact,  rate  2  is  achievable,  as  will  be  shown  in  Sec.  2.6  using 
a  linear  code  supplemented  by  comparison  operations,  and  again  in  Sec.  2.8  using 
a  Polytope  Code. 


2.5  Capacity  of  A  Class  of  Planar  Networks 


Theorem  4  Let  (V,  E )  be  a  network  with  the  following  properties: 


1.  It  is  planar. 
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2.  No  node  other  than  the  source  has  more  than  two  unit-capacity  output  edges. 

3.  No  node  other  than  the  source  has  more  output  edges  than  input  edges. 

If  s  =  l,  the  cut-set  bound  given  by  Theorem  3  is  tight  for  this  network. 

Polytope  Codes  are  used  to  prove  achievability  for  this  theorem.  The  complete 
proof  is  given  in  Sec.  2.10,  but  first  we  develop  the  theory  of  Polytope  Codes  by 
means  of  several  examples  in  Sec.  2. 6-2. 8  and  general  properties  in  Sec.  2.9. 

Perhaps  the  most  interesting  condition  in  the  statement  of  Theorem  4  is  the 
planarity  condition.  Recall  that  a  graph  is  said  to  be  embedded  in  a  surface  (usually 
a  two-dimensional  manifold)  when  it  is  drawn  in  this  surface  so  that  no  two  edges 
intersect.  A  graph  is  planar  if  it  can  be  embedded  in  the  plane  [90]. 


2.6  A  Linear  Code  with  Comparisons  for  the  Cockroach 
Network 

The  Cockroach  network  satisfies  the  conditions  of  Theorem  4.  Fig  2.1  shows  a 
plane  embedding  with  both  S  and  D  on  the  exterior,  and  the  second  condition  is 
easily  seen  to  be  satisfied.  Therefore,  since  the  smallest  cut-set  bound  given  by 
Theorem  3  for  a  single  traitor  node  is  2,  as  we  have  discussed,  Theorem  4  claims 
that  the  capacity  of  the  Cockroach  network  is  2.  In  this  section,  we  present  a 
capacity-achieving  code  for  the  Cockroach  network  that  is  a  linear  code  over  a 
finite-field  supplemented  by  nonlinear  comparisons.  This  illustrates  the  usefulness 
of  comparisons  in  defeating  adversaries  against  network  coding.  Before  doing  so,  we 
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provide  an  intuitive  argument  that  linear  codes  are  insufficient.  A  more  technical 
proof  that  the  linear  capacity  is  in  fact  4/3  is  given  in  Sec.  2.13. 

Is  it  possible  to  construct  a  linear  code  achieving  rate  2  for  the  Cockroach 
network?  We  know  from  the  Singleton  bound-type  argument — the  argument  at 
the  heart  of  the  proof  of  Theorem  3 — that,  in  order  to  defeat  a  single  traitor  node, 
if  we  take  out  everything  controlled  by  two  nodes,  the  destination  must  be  able  to 
decode  from  whatever  remains.  Suppose  we  take  out  nodes  2  and  3.  These  nodes 
certainly  control  the  values  on  (5 ,D)  and  (3 ,D),  so  if  we  hope  to  achieve  rate  2, 
the  values  on  (1 ,  D)  and  (4,  D)  must  be  uncorruptable  by  nodes  2  and  3.  Edge 
(1,  D)  is  not  a  problem,  but  consider  (4,  D).  With  a  linear  code,  the  value  on  this 
edge  is  a  linear  combination  of  the  values  on  (1,4)  and  (2,4).  In  order  to  keep 
the  value  on  (4,  D )  uncorruptable  by  node  2,  the  coefficient  used  to  construct  the 
value  on  (4,  D)  from  (2, 4)  must  be  zero.  In  other  words,  the  value  on  (1, 4)  should 
be  merely  forwarded  to  (4,  D).  By  a  symmetric  argument  removing  nodes  1  and  2, 
the  value  on  (3,5)  should  be  forwarded  to  (5 ,D).  But  now  we  can  remove  nodes 
1  and  3,  and  control  everything  received  by  the  destination.  Therefore  no  linear 
code  can  successfully  achieve  rate  2. 

This  argument  does  not  rigorously  show  that  the  linear  capacity  is  less  than  2, 
because  it  shows  only  that  a  linear  code  cannot  achieve  exactly  rate  2,  but  it  does 
not  bound  the  achievable  rate  with  a  linear  code  away  from  2.  However,  it  is  meant 
to  be  an  intuitive  explanation  for  the  limitations  of  linear  codes  for  this  problem, 
as  compared  with  the  successful  nonlinear  codes  that  we  will  subsequently  present. 
The  complete  proof  that  the  linear  capacity  is  4/3  is  given  in  Sec.  2.13. 

We  now  introduce  a  nonlinear  code  to  achieve  the  capacity  of  2.  We  work  in 
the  finite  field  of  p  elements.  Let  the  message  w  be  a  2/c-length  vector  split  into 
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Figure  2.3:  A  nonlinear  code  for  the  Cockroach  Network  achieving  the  capacity  of 
2. 

two  h-length  vectors  x  and  y.  We  will  use  a  block  length  large  enough  to  place 
one  of  2 pk  values  on  each  link.  In  particular,  enough  to  place  on  a  link  some  linear 
combination  of  x  and  y  plus  one  additional  bit.  For  large  enough  k,  this  extra  bit 
becomes  insignificant,  so  we  still  achieve  a  rate  of  2. 

The  scheme  is  shown  in  Figure  2.3.  Node  4  receives  the  vector  y  from  both  1 
and  2.  It  forwards  one  of  these  copies  to  D  (it  does  not  matter  which).  In  addition, 
it  performs  a  nonlinear  comparison  between  the  two  received  copies  of  y,  resulting 
an  one  additional  bit  comprised  of  one  of  the  special  symbols  =  or  7C  If  the  two 
received  copies  of  y  agree,  it  forwards  =,  otherwise  it  sends  yh  The  link  (4,  D)  can 
accommodate  this,  since  it  may  have  up  to  2 pk  messages  placed  on  it.  Node  5  does 
the  same  with  its  two  copies  of  the  vector  x  +  y. 

The  destination’s  decoding  strategy  depends  on  which  of  the  two  comparison 
bits  sent  from  nodes  4  and  5  are  =  or  y^,  as  follows: 

•  If  the  bit  from  node  4  is  7^  but  the  bit  from  5  is  =,  then  the  traitor  must  be 
either  node  1,  2,  or  4.  In  any  case,  the  vector  x  —  y  received  from  node  3  is 
certainly  trustworthy.  However,  x  +  y  is  trustworthy  as  well,  because  even  if 
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node  2  is  the  traitor,  its  transmission  must  have  matched  whatever  was  sent 
by  node  3,  because  if  not  node  5  would  have  transmitted  7b  Since  it  did  not, 
the  destination  can  trust  both  x  +  y  and  x  —  y,  from  which  it  can  decode  the 
message  w  =  (x,y). 

•  If  the  message  from  5  is  7^  but  the  message  from  4  is  =,  then  we  are  in  the 
symmetric  situation  and  can  reliably  decode  w  from  x  and  y. 

•  If  both  the  messages  from  4  and  5  are  7^,  then  the  traitor  must  be  node  2, 
in  which  case  x  and  x  —  y  are  trustworthy,  so  the  destination  can  decode  w. 

•  If  both  messages  are  =,  then  the  destination  cannot  eliminate  any  node  as 
a  possible  traitor.  However,  at  most  one  of  x,  y,  x  +  y,  x  —  y  can  have  been 
corrupted  by  the  traitor,  because  no  node  controls  more  than  one  of  the 
vectors  received  at  the  destination.  For  instance,  if  node  1  is  the  traitor,  it 
may  choose  whatever  it  wants  for  x ,  and  the  destination  would  never  know. 
However,  node  1  cannot  impact  the  value  of  y  without  inducing  a  7^,  because 
its  transmission  to  node  4  is  verified  against  that  from  node  2.  Similarly, 
node  3  controls  x  —  y  but  not  x  +  y.  Nodes  4  and  5  control  only  y  and  x  +  y 
respectively.  Node  2  controls  nothing,  because  both  y  and  x  +  y  are  checked 
against  other  transmissions.  Therefore,  if  the  destination  can  find  three  of 
x,y,x  +  y,x  —  y  that  all  agree  on  the  message  w,  then  this  message  must 
be  the  truth  because  only  one  of  them  could  be  corrupted,  and  w  can  be 
decoded  from  the  other  two.  Conversely,  there  must  be  a  group  of  three  of 
x,y,x  +  y,x  +  2y  that  agree,  because  at  most  one  has  been  corrupted.  Hence, 
the  destination  can  always  decode  w. 

Even  though  our  general  proof  of  Theorem  4  uses  a  Polytope  Code,  which  differs 
significantly  from  this  one,  the  manner  in  which  the  comparisons  comes  into  play  is 
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essentially  the  same.  The  key  insight  is  to  consider  the  code  from  the  perspective 
of  the  traitor.  Suppose  it  is  node  1,  and  consider  the  choice  of  what  value  for  y 
to  send  along  edge  (1,4).  If  it  sends  a  false  value  for  y ,  then  the  comparison  at 
node  4  will  fail,  which  will  lead  the  destination  to  consider  the  upper  part  of  the 
network  suspect,  and  thereby  ignore  all  values  influenced  by  node  1.  The  only 
other  choice  for  node  1  is  to  cause  the  comparison  at  node  4  to  succeed;  but  this 
requires  sending  the  true  value  of  y,  which  means  it  has  no  hope  to  corrupt  the 
decoding  process.  This  is  the  general  principle  that  makes  our  codes  work:  force 
the  traitor  to  make  a  choice  between  acting  like  an  honest  node,  or  acting  otherwise 
and  thereby  giving  away  its  position. 

We  make  one  further  note  on  this  code,  having  to  do  with  why  the  specific 
approach  used  here  for  the  Cockroach  network  fails  on  the  more  general  problem. 
Observe  that  in  order  to  make  an  effective  comparison,  the  values  sent  along  edges 
(1,4)  and  (2,4)  needed  to  be  exactly  the  same.  If  they  had  been  independent 
vectors,  no  comparison  could  be  useful.  This  highly  constrains  the  construction  of 
the  code,  and  even  though  it  succeeds  for  this  network,  it  fails  for  others,  such  as 
the  Caterpillar  network,  to  be  introduced  in  the  next  section.  The  advantage  of 
the  Polytope  Code  is  that  it  deconstrains  the  types  of  values  that  must  be  available 
in  order  to  form  a  useful  comparison;  in  fact,  it  becomes  possible  to  have  useful 
comparisons  between  nearly  independent  variables,  which  is  not  possible  with  a 
code  built  on  a  finite-field. 
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1  5 


Figure  2.4:  The  Caterpillar  Network.  One  node  may  be  a  traitor,  but  only  one  of 
the  black  nodes:  nodes  1-4. 

2.7  An  Example  Polytope  Code:  The  Caterpillar  Network 

The  Caterpillar  Network  is  shown  in  Figure  2.4.  We  consider  a  slightly  different 
version  of  the  node-based  Byzantine  attack  on  this  network:  at  most  one  node 
may  be  a  traitor,  but  only  nodes  1-4.  This  network  is  not  in  the  class  defined  in 
the  statement  of  Theorem  4,  but  we  introduce  it  in  order  to  motivate  the  Polytope 
Code. 

Even  though  this  problem  differs  from  the  one  defined  earlier  in  that  not  every 
node  in  the  network  may  be  a  traitor,  it  is  easy  to  see  that  we  may  still  apply  the 
cut-set  bound  of  Theorem  3  as  long  as  we  take  the  set  T  to  be  a  subset  of  the 
allowable  traitors.  If  we  apply  Theorem  3  with  A  =  {S,  1,2,  3, 4}  and  T  =  {1,2}, 
we  find  that  the  capacity  of  this  network  is  no  more  than  2.  As  we  will  show,  the 
capacity  is  2. 

Before  we  demonstrate  how  rate  2  is  achieved,  consider  what  is  required  to  do 
so  for  this  network.  Of  the  four  values  on  the  edges  (1,  5),  (2,  6),  (3,  7),  (4,  8),  one 
may  be  corrupted  by  the  adversary.  This  means  that  these  four  values  must  form  a 
(4,  2)  MDS  code.  That  is,  given  any  uncorrupted  pair  of  these  four  values,  it  must 
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be  possible  to  decode  the  message  exactly.  Since  each  edge  has  capacity  1,  in  order 
to  achieve  rate  2,  the  values  on  each  pair  of  edges  must  be  independent,  or  nearly 
independent.  For  example,  we  could  take  the  message  to  be  composed  of  two 
elements  x,  y  from  a  finite  held,  and  transmit  on  these  four  edges  x,  y,  x  +  y,  x  —  y. 
However,  as  we  will  argue,  this  choice  does  not  succeed. 

Now  consider  the  two  edges  (9,  D)  and  (10,  D).  As  these  are  the  only  edges 
incident  to  the  destination,  to  achieve  rate  2,  both  must  hold  values  guaranteed  to 
be  uncorrupted  by  the  traitor.  We  may  assume  that  nodes  5-8  forward  whatever 
they  receive  on  their  incoming  edges  to  all  their  outgoing  edges,  so  node  10  receives 
all  four  values  sent  from  nodes  1-4.  From  these,  it  can  decode  the  entire  message,  so 
it  is  not  a  problem  to  construct  a  trustworthy  value  to  send  along  (10,  D).  However, 
node  9  has  access  to  only  three  of  the  four  values  sent  from  nodes  1-4,  from  which  it 
is  not  obvious  how  to  construct  a  trustworthy  value.  The  key  problem  in  designing 
a  successful  code  is  to  design  the  values  placed  on  edges  (1,5),  (2,6),  (3,7)  to  be 
pairwise  independent,  but  such  that  if  one  value  is  corrupted,  it  is  always  possible 
to  construct  a  trustworthy  value  to  transmit  on  (9 ,D).  This  is  impossible  to 
do  using  a  finite  held  code.  For  example,  suppose  if  node  9  receives  values  for 
x,y,x  +  y,  one  of  which  may  be  corrupted  by  the  traitor.  If  the  linear  constraint 
among  these  three  values  does  not  hold — that  is,  if  the  received  value  for  x  +  y 
does  not  match  the  sum  of  the  value  for  x  and  the  value  for  y — then  any  of  the 
three  values  may  be  the  incorrect  one.  Therefore,  from  node  9’s  perspective,  any 
of  nodes  1,  2,  or  3  could  be  the  traitor.  In  order  to  produce  a  trustworthy  symbol, 
it  must  rule  out  at  least  one  node  as  a  possible  traitor.  If,  for  example,  it  could 
determine  that  the  traitor  was  either  node  1  or  2  but  not  3,  then  the  value  sent 
along  (3,  7)  could  be  forwarded  to  (9,  D )  with  a  guarantee  of  correctness.  Sending 
x,y,x  +  y  along  the  edges  (1,  5),  (2,  6),  (3,  7)  does  not  allow  this.  In  fact,  sending 
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any  three  elements  of  a  finite  field,  subject  by  a  single  linear  constraint,  cannot 
work,  but  a  Polytope  Code  can. 

2.7.1  Coding  Strategy 

We  now  begin  to  describe  a  capacity-achieving  Polytope  Code  for  the  Caterpillar 
network.  We  do  so  first  by  describing  how  the  code  is  built  out  of  a  probability 
distribution,  and  the  properties  we  might  like  this  probability  distribution  to  have. 
Subsequently,  we  give  an  explicit  construction  for  a  probability  distribution  derived 
from  a  polytope  over  a  real  vector  field,  and  show  that  it  has  the  desired  properties. 

Let  X,  Y,  Z,  W  be  jointly  distributed  random  variables  on  the  same  finite  alpha¬ 
bet  X.  Assume  all  probabilities  on  these  random  variables  are  rational.  For  a  block 
length  n  that  is  a  multiple  of  the  lowest  common  denominator  of  the  joint  distribu¬ 
tion  of  X,  y,  Z ,  W,  we  may  consider  the  set  of  all  joint  sequences  (xnynznwn)  with 
joint  type  exactly  equal  to  this  joint  distribution.  Denote  this  set  T™(XY ZW). 
We  know  from  the  theory  of  types  that 

\t;{xyzw)\  >  ,n+\)m*2nH{XYZW)-  (2.6) 

Our  coding  strategy  will  be  to  associate  each  element  of  T™(XYZW)  with  a 
distinct  message.  Given  the  message,  we  find  the  associated  four  sequences 
xn,yn,  zn,wn,  and  transmit  them  on  the  four  edges  out  of  nodes  1,2, 3, 4  respec¬ 
tively.  Doing  this  requires  placing  a  sequence  in  X"  on  each  edge.  Therefore  the 
rate  of  this  code  is 

log\Tp(XYZW)\  HjXYZW)  _  lX|4log(n  +  l) 
nlog|X|  —  log  | X|  nlog|X| 

Note  that  for  sufficiently  large  n,  we  may  operate  at  a  rate  arbitrarily  close  to 

H(XYZW) 
log  |X| 
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Because  of  the  adversary,  the  actual  sequences  sent  out  of  nodes  1-4  may  dif¬ 
fer  from  what  is  sent  out  of  the  source.  Let  xn,yn,zn,wn  be  the  four  sequences 
as  they  actually  appear  on  the  four  edges;  at  most  one  of  these  may  differ  from 
xn,yn,  zn,  wn.  We  may  now  define  random  variables  X,Y,  Z,W  to  have  joint  dis¬ 
tribution  equal  to  the  joint  type  of  (xn,  yn,  zn,  wn).  This  is  a  formal  definition;  these 
variables  do  not  actually  exist,  but  nodes  that  have  access  to  these  sequences  can 
construct  the  related  random  variables.  For  example,  node  9  observes  xn,yn,zn, 
so  it  knows  exactly  the  joint  distribution  of  X,  Y,  Z.  The  advantage  of  this  coding 
strategy  is  that  node  9  can  now  check  whether  the  distribution  of  these  random 
variables  matches  that  of  A",  Y,  Z.  If  the  distributions  differ,  a  traitor  must  be 
present. 

The  sequences  placed  on  the  edges  out  of  nodes  1-4  must  be  such  that  nodes 
9  and  10  can  successfully  find  trustworthy  values  to  place  on  edges  (9,  D)  and 
(10,  D).  In  order  for  this  to  be  possible,  any  two  of  X,  Y,  Z,  W  must  determine  the 
others.  Moreover,  as  we  have  discussed,  the  significant  difficulty  is  allowing  node 
9  to  narrow  down  the  list  of  possible  traitors  to  just  two  out  of  nodes  1-3.  The 
following  property  on  the  variables  allows  this. 

Property  1  The  distribution  of  (X,Y,  Z)  is  such  that  for  any  three  random  vari¬ 
ables  (A,  Y,  Z)  satisfying 


{X,Y)~{X,Y) 

(2.8) 

(X,Z)~(X,Z) 

(2.9) 

(Y,Z)~(Y,Z) 

(2.10) 

the  following  holds: 

(X,Y,Z)~(X,Y,Z).  (2.11) 
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Suppose  we  have  random  variables  X,  Y,  Z,  W  such  that  (. X ,  Y,  Z)  satisfy  Prop¬ 
erty  1.  We  will  show  in  Sec.  2.7.2  that  such  a  set  of  random  variables  exists.  The 
process  at  node  9  to  transmit  a  message  to  the  destination  is  as  follows.  Node  9 
observes  X,  Y,  Z.  If  the  joint  distribution  of  these  three  variables  matches  that  of 
(X,  Y,  Z),  then  all  three  sequences  xn,  yn,  zn  are  trustworthy,  because  if  a  traitor  is 
among  nodes  1-3,  it  must  have  transmitted  the  true  value  of  its  output  sequence, 
or  else  the  empirical  type  would  not  match,  due  to  the  fact  that  any  two  of  the  four 
variables  determine  the  other  two.  Therefore,  node  9  forwards  xn  to  the  destina¬ 
tion,  confident  that  it  is  correct.  Meanwhile,  node  10  can  also  observe  X,  Y,  Z ,  and 
so  it  forwards  yn  to  the  destination.  If  the  two  distributions  are  different,  then  by 
Property  1,  one  of  (2.8),  (2.9),  or  (2.10)  must  not  hold.  Suppose,  for  example,  that 
(X,  Y)  rfj  (X,  Y).  If  both  node  1  and  2  were  honest,  then  by  our  code  construction, 
(2.8)  would  hold.  Since  it  did  not,  one  of  nodes  1  or  2  must  be  the  traitor.  We 
have  thereby  succeeded  in  reducing  the  number  of  nodes  that  may  be  the  traitor 
to  two,  so  node  9  may  forward  zn  to  the  destination  with  confidence.  Similarly, 
whichever  pairwise  distribution  does  not  match,  node  9  can  always  forward  the 
sequence  not  involved  in  the  mismatch.  Meanwhile,  node  10  may  forward  wn  to 
the  destination,  since  in  any  case  the  traitor  has  been  localized  to  nodes  1-3.  The 
destination  always  receives  two  of  the  four  sequences,  both  guaranteed  correct; 
therefore  it  may  decode. 

2.7.2  The  Polytope  Distribution 

All  that  remains  to  prove  that  rate  2  can  be  achieved  for  the  Caterpillar  network  is 
to  show  that  there  exists  variables  X,  Y,  Z,  W  such  that  any  two  variables  determine 
the  other  two,  satisfying  Property  1,  and  such  that  -  =  2.  In  fact,  this 
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Tabic  2.1:  A  simple  distribution  satisfying  Property  1. 


x  y  z  Pr(X  =  x,  Y  —  y,  Z  —  z) 


0  0  0  0 

001  1/3 

010  1/3 

011  0 

100  1/3 

10  1  0 

110  0 

111  0 


is  not  quite  possible.  If  the  entropy  requirement  holds  exactly,  then  X,  Y,  Z,  W 
must  be  pairwise  independent,  and  if  so  Property  1  cannot  hold,  because  we  can 
take  X,  Y,  Z  to  be  jointly  independent  with  X  rsj  X,  Y  rv./  Y,  and  Z  ~  Z.  This 
satisfies  (2.8)-(2.10)  but  not  (2.11).  In  fact,  we  need  only  show  that  a  suitable 
set  of  variables  exists  such  that  >  2  —  e  for  arbitrarily  e  >  0.  This  is 

possible,  and  indicates  that  the  set  of  distributions  satisfying  Property  1  is  not  a 
topologically  closed  set. 

The  most  unusual  aspect  of  the  Polytope  Code  is  Property  1  and  its  gener¬ 
alization,  to  be  stated  as  Theorem  5  in  Sec.  2.9.  Therefore,  before  constructing 
a  distribution  used  to  achieve  rate  2  for  the  Caterpillar  network,  we  illustrate  in 
Table  2.1  a  very  simple  distribution  on  three  binary  variables  variables  satisfy¬ 
ing  Property  1.  This  distribution  is  only  on  X,  Y,  Z;  to  simplify  we  momentarily 
leave  out  W,  because  it  is  not  involved  in  Property  1.  We  encourage  the  reader 
to  manually  verify  Property  1  for  this  distribution.  Observe  that  X,  Y,  Z  given  in 
Table  2.1  may  be  alternatively  expressed  as  being  uniformly  distributed  on  the  set 
of  x,y,z  G  {0, 1}  satisfying  x  +  y  +  z  =  1.  This  set  is  a  polytope,  which  motivates 
the  more  general  construction  of  the  distribution  to  follow. 

We  now  construct  the  distribution  used  to  achieve  rate  2  for  the  Caterpillar 
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network.  For  any  positive  integer  k,  consider  the  set  of  x,  y,  z,w  G  {— k , . . .  ,k} 
satisfying 


x  +  y  +  z  =  0 
3a;  —  y  +  2w  =  0. 


(2.12) 

(2.13) 


This  is  the  set  of  integer  lattice  points  in  a  polytope.  Let  A",  Y,  Z,  W  be  uniform 
over  these  points.  Observe  first  that  this  distribution  satisfies  the  requirement  that 
any  two  variables  determine  the  others.  The  region  of  (x,  y )  pairs  with  positive 
probability  is  shown  in  Figure  2.5.  Note  that  even  though  the  subspace  defined 
by  (2.12)-(2.13)  projected  onto  the  (x,y)  plane  is  two-dimensional,  X  and  Y  are 
not  statistically  independent,  because  the  boundedness  of  Z  and  W  requires  that 
X  and  Y  satisfy  certain  linear  inequalities.  Nevertheless,  the  area  of  the  polygon 
shown  in  Figure  2.5  grows  as  0 (k2).  Hence  the  rate  of  the  code  resulting  from  this 
distribution  is 

log  HjXYZW)  =  log  0(k2) 
log  |X|  log(2/c  +  1)  ’ 

For  large  k,  this  can  be  made  arbitrarily  close  to  2.  When  k  is  large,  any  pair  of 
the  four  variables  are  nearly  statistically  independent,  in  that  their  joint  entropy 
is  close  to  the  sum  of  their  individual  entropies.  We  have  therefore  constructed 
something  like  a  (4,2)  MDS  code.  In  fact,  if  we  reinterpret  (2. 12) — (2.13)  as  con¬ 
straints  on  elements  x,  y,  z,  w  of  a  finite  field,  the  resulting  finite  subspace  would  be 
exactly  a  (4,  2)  MDS  code.  This  illustrates  a  general  principle  of  Polytope  Codes: 
any  code  construction  on  a  finite  field  can  be  immediately  used  to  construct  a 
Polytope  Code,  and  many  of  the  properties  of  the  original  code  will  hold  over. 
The  resulting  code  will  be  substantially  harder  to  implement,  in  that  it  involves 
much  longer  block-lengths,  and  more  complicated  coding  functions,  but  it  allows 
properties  like  Property  1  to  hold. 
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Figure  2.5:  An  example  polytope  projected  into  the  (x,y)  plane. 


All  that  remains  is  to  verify  Property  1  on  the  polytope  distribution.  Assuming 


X,Y,Z  satisfy  (2.8)-(2.10),  we  may  write 

E  [(X  +  Y  +  Z)2]  =  E  [X2  +  Y2  +  Z2  +  2  AT  +  2  XZ  +  2  YZ]  (2. 15) 

=  E  [X2  +  Y2  +  Z2  +  2XY  +  2XZ  +  2  YZ\  (2.16) 

=  E  [(X  +  Y  +  Z)2]  (2.17) 

=  0  (2.18) 


where  (2.16)  holds  from  (2.8)-(2.10),  and  because  each  term  in  the  some  involves 
at  most  two  of  the  three  variables;  and  (2.18)  holds  because  X  +  Y  +  Z  =  0  by 
construction.  Now  we  may  write 

(X,Y,Z)  =  (X,Y,-X-Y)  (2.19) 

-  (X,  Y,  -X  -  Y)  (2.20) 

=  (X,Y,Z)  (2.21) 

where  (2.20)  holds  by  (2.8).  This  concludes  the  proof  of  Property  1. 


Observe  that  the  linear  constraint  X  +  Y  +  Z  —  0  was  in  no  way  special;  the 
proof  could  work  just  as  well  under  any  linear  constraint  with  nonzero  coefficients 
for  all  three  variables.  This  completes  the  proof  of  correctness  for  the  Polytope 
Code  for  the  Caterpillar  network. 
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2.8  A  Polytope  Code  for  the  Cockroach  Network 

We  return  now  to  the  Cockroach  network,  and  demonstrate  a  capacity-achieving 
Polytope  Code  for  it.  We  do  this  not  to  find  the  capacity  for  the  network,  because 
we  have  already  done  so  with  the  simpler  code  in  Sec.  2.6,  but  to  illustrate  a 
Polytope  Code  on  a  network  satisfying  the  conditions  of  Theorem  4,  which  are 
somewhat  different  from  the  Caterpillar  network. 

In  Sec.  2.6,  we  illustrated  how  performing  comparisons  and  transmitting  com¬ 
parison  bits  through  the  network  can  help  defeat  traitors.  In  Sec.  2.7,  we  illustrated 
how  a  code  can  be  built  out  a  distribution  on  a  polytope,  and  how  a  special  prop¬ 
erty  of  that  distribution  comes  into  play  in  the  operation  of  the  code.  To  build  a 
Polytope  Code  for  the  Cockroach  network,  we  combine  these  two  ideas:  the  pri¬ 
mary  data  sent  through  the  network  comes  from  the  distribution  on  a  polytope, 
but  then  comparisons  are  performed  in  the  network  in  order  to  localize  the  traitor. 

The  first  step  in  constructing  a  Polytope  Code  is  to  describe  a  distribution  over 
a  polytope.  That  is,  we  define  a  linear  subspace  in  a  real  vector  field,  and  take  a 
uniform  distribution  over  the  polytope  defined  by  the  set  of  vectors  with  entries 
in  {—k, . . . ,  k}  for  some  integer  k.  The  nature  of  this  distribution  depends  on  the 
characteristics  of  the  linear  subspace.  For  our  code  for  the  Cockroach  network,  we 
need  one  that  is  the  equivalent  of  a  (6,  2)  MDS  code.  That  is,  the  linear  subspace 
sits  in  R6,  has  dimension  2,  and  is  defined  by  four  constraints  such  that  any  two 
variables  determine  the  others.  One  choice  for  the  subspace,  for  example,  would 
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be  the  set  of  (a,  6,  c,  d,  e,  f )  satisfying 


a  +  b  +  c  =  0 

(2.22) 

a  —  b  +  d  =  0 

(2.23) 

ci  T  2b  T  e  —  0 

(2.24) 

2a  +  b  +  f  =  0. 

(2.25) 

Let  the  random  variables  A,  B ,  C,  D,  E,  F  have  joint  distribution  uniformly  dis¬ 
tributed  over  the  polytope  defined  by  (2.22)-(2.25)  and  a ,  b ,  c,  d,e,  f  G  {— k , . . . ,  k}. 
By  a  similar  argument  to  that  in  Sec.  2.7,  for  large  /c, 


H(ABCDEF) 
log(2  k  +  1) 


(2.26) 


We  choose  a  block  length  n  and  associate  each  message  with  a  joint  sequence 
(anbncndnenfn)  with  joint  type  exactly  equal  to  the  distribution  of  the  six  variables. 
For  large  n  and  k,  we  may  place  one  sequence  an-fn  on  each  unit  capacity  edge 
in  the  network  and  operate  at  rate  2.  These  six  sequences  are  generated  at  the 
source  and  then  routed  through  the  network  as  shown  in  Fig.  2.6.  For  convenience, 
the  figure  refers  to  the  variables  as  scalars  instead  of  vectors,  but  we  always  mean 
them  to  be  sequences. 


As  in  Sec.  2.7,  we  define  A,  B,C ,  D,  E,  F  to  have  joint  distribution  equal  to 
the  type  of  the  six  sequences  an  they  actually  appear  in  the  network,  which  may 
differ  from  the  sequences  sent  by  the  source  because  of  the  adversary.  In  addition  to 
forwarding  one  sequence  as  shown  in  Fig.  2.6,  nodes  4  and  5  perform  more  elaborate 
operations.  In  particular,  they  compare  the  types  of  their  received  sequences  with 
the  original  distribution.  For  example,  node  4  receives  the  two  sequences  bn  and 
cn,  from  which  it  can  construct  B  and  C.  It  checks  whether  the  joint  distribution 
of  ( B,C )  matches  that  of  ( B,C ),  and  forwards  a  single  bit  relaying  whether  they 
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Figure  2.6:  A  capacity-achieving  Polytope  Code  for  the  Cockroach  Network. 

agree  along  the  edge  (4,  D)  in  addition  to  the  sequence  cn.  This  single  bit  costs 
asymptotically  negligible  rate,  so  it  has  no  effect  on  the  achieved  rate  of  the  code 
for  large  n  and  k.  Node  5  performs  a  similar  action,  comparing  the  distribution  of 
( D ,  E )  with  that  of  (D,  E),  and  transmitting  a  comparison  bit  to  the  destination. 

We  now  describe  the  decoding  operation  at  the  destination.  The  first  step  is 
to  compile  a  list  of  possible  traitors.  We  denote  this  list  L  C  {1,  ...,5}.  The 
destination  does  this  in  the  following  way.  Since  the  code  is  entirely  known,  with 
no  randomness,  it  can  determine  whether  all  its  received  data  could  be  induced 
if  each  node  were  the  traitor.  That  is,  it  considers  each  possible  message,  each 
possible  traitor,  and  each  possible  set  of  values  on  the  output  edges  of  that  traitor. 
Any  given  combination  of  these  three  things  gives  a  deterministic  set  of  values 
received  at  the  destination,  which  may  be  compared  to  the  set  of  values  that  the 
destination  has  in  fact  received.  If  a  node  i  is  such  that  it  could  have  been  the 
traitor  and  induced  the  set  of  values  received  at  the  destination,  for  any  message 
and  any  action  by  node  i,  then  i  is  put  onto  £>.  This  process  ensures  that  the 
true  traitor,  even  though  it  may  not  be  known  by  the  destination,  is  surely  in  L. 
Note  that  this  procedure  could  in  principle  be  done  for  any  code,  not  necessarily 
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a  Polytope  Code. 

The  next  step  in  the  decoding  process  is  to  use  £  to  decide  from  which  of 
the  four  symbols  available  at  the  destination  to  decode.  Since  any  pair  of  the  six 
original  symbols  contain  all  the  information  in  the  message,  if  at  least  two  of  the 
four  symbols  a,  c,  d,  f  can  be  determined  to  be  trustworthy  by  the  destination,  then 
it  can  decode.  The  destination  discards  any  symbol  that  was  touched  by  all  nodes 
in  £,  and  decodes  from  the  rest.  For  example,  if  £  =  {2},  then  the  destination 
discards  c,d  and  decodes  from  a,f.  If  £  =  {2,4},  the  destination  discards  just 
c — because  it  is  the  only  symbol  touched  by  both  nodes  2  and  4 — and  decodes 
from  a,d,  f.  If  £  =  {1, . . . ,  5},  then  it  discards  no  symbols  and  decodes  from  all 
four. 

The  prove  the  correctness  of  this  code,  we  must  show  that  the  destination  never 
decodes  from  a  symbol  that  was  altered  by  the  traitor.  This  is  easy  to  see  if  |£|  =  1, 
because  in  this  case  the  destination  knows  exactly  which  node  is  the  traitor,  and 
it  simply  discards  all  symbols  that  may  have  been  influenced  by  this  node.  Since 
no  node  touches  more  than  two  of  the  symbols  available  at  the  destination,  there 
are  always  at  least  two  remaining  from  which  to  decode. 

More  complicated  is  when  |£|  >2.  In  this  case,  the  decoding  process,  as  de¬ 
scribed  above,  sometimes  requires  the  destination  to  decode  from  symbols  touched 
by  the  traitor.  For  example,  suppose  node  2  were  the  traitor,  and  £  =  {2,4}.  The 
destination  discards  c,  since  it  is  touched  by  both  nodes  2  and  4,  but  it  decodes 
from  the  other  available  symbols:  a,  d,  f.  In  particular,  the  destination  uses  d  to 
decode,  even  though  it  is  touched  by  node  2.  Therefore,  to  prove  correctness  we 
must  show  that  it  was  impossible  for  node  2  to  have  transmitted  to  node  5  any¬ 
thing  but  the  true  value  of  d.  What  we  use  to  prove  this  is  the  fact  that  £  contains 
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node  4.  That  means  that  node  2  must  have  acted  in  a  way  such  that  it  appears  to 
the  destination  that  node  4  could  be  the  traitor.  This  induces  constraints  on  the 
behavior  of  node  2.  For  instance,  the  comparison  that  occurs  at  node  5  between 
d  and  e  must  succeed.  If  it  did  not,  then  the  destination  would  receive  a  bit  indi¬ 
cating  a  failed  comparison  from  node  5.  This  precludes  node  4  being  the  traitor, 
because  if  it  were,  it  could  not  have  induced  this  failed  comparison  bit.  Therefore 
the  distribution  of  ( D,E )  must  be  identical  to  that  of  ( D,E ).  This  constitutes  a 
constraint  on  node  2  in  its  transmission  of  d.  Moreover,  ( D,F )  ~  ( D,F ),  because 
the  destination  may  observe  d  and  /,  so  it  could  detect  a  difference  between  these 
two  distributions  if  it  existed.  Since  both  are  untouched  by  node  4,  if  the  distri¬ 
butions  did  not  match  then  node  4  would  not  be  placed  on  L.  Finally,  we  have 
that  ( E,F  ~  ( E,F ).  This  holds  simply  because  neither  e  nor  /  are  touched  by 
the  traitor  node  2.  Summarizing,  we  have 


(D,E)~(D,E)t 

(2.27) 

(D,F)~{D,F), 

(2.28) 

{E,F)~(E,F). 

(2.29) 

Given  these  three  conditions,  we  apply  Property  1  to  conclude  that  (l),  E,  F)  ~ 
(D,E,F).  We  may  do  this  because,  as  we  argued  in  Sec.  2.7,  Property  1  holds 
for  for  any  three  variables  in  a  polytope  subject  to  a  single  linear  constraint  with 
nonzero  coefficients  one  each  one.  Since  we  have  constructed  the  6  variables  to  be 
a  (6,  2)  MDS  code,  this  is  true  here  (e.g.  in  the  space  defined  by  (2.22)-(2.25),  the 
three  variables  D,E,F  are  subject  to  D  +  E  —  F  =  0).  Since  e  and  /  together 
specify  the  entire  message,  in  order  for  this  three-way  distribution  to  match,  the 
only  choice  for  d  is  the  true  value  of  d.  This  concludes  the  proof  for  this  case, 
because  we  have  shown  that  in  order  for  node  2  to  act  in  a  way  so  as  to  cause 
L  =  {2,4},  it  cannot  have  altered  the  value  of  d  at  all.  Therefore  the  destination 
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is  justified  it  using  it  to  decode  the  message. 


The  above  analysis  holds  for  any  L  containing  {2,4}.  That  is,  if  node  2  is  the 
traitor,  and  4  €  £,  then  node  2  cannot  corrupt  d.  It  is  enough  to  prove  correctness 
of  the  code  to  prove  a  similar  fact  for  every  pair  of  nodes.  In  particular,  we  wish 
to  show  that  if  node  i  is  the  traitor,  and  node  j  £  £,  then  node  i  can  only  corrupt 
values  also  touched  by  node  j.  This  implies  that  if  node  i  is  the  traitor,  it  cannot 
corrupt  any  symbol  not  touched  by  any  node  in  £.  Therefore  the  destination  is 
justified  in  only  discarding  symbols  touched  by  every  node  in  L. 

Moreover,  it  is  enough  to  consider  each  unordered  pair  only  once.  For  example, 
as  we  have  already  proven  this  fact  for  i  =  2  and  j  =  4,  we  do  not  need  to  perform 
a  complete  proof  for  i  —  4  and  j  =  2.  This  is  justified  as  follows.  Suppose  node  4  is 
the  traitor  and  2  e  L.  We  know  from  the  above  argument  that  when  node  2  is  the 
traitor  and  4  e  L,  d  is  uncorrupted,  meaning  (A,D,F)  ~  ( A,D,F ).  This  means 
that  if  ( A,D,F )  ( A,D,F )  and  4  e  £,  then  2  ^  £.  Hence,  if  2,4  e  £,  then 
(AD,F)  (H,  D,F).  Since  when  node  4  is  the  traitor,  a  and  /  are  uncorrupted, 
this  implies  that  the  only  choice  for  d  transmitted  by  is  the  true  value  of  d. 

We  now  complete  the  proof  of  correctness  of  the  proposed  Polytope  Code  for 
the  Cockroach  network  by  considering  all  unordered  pairs  of  potential  traitors  in 
the  network: 

(1,2)  Suppose  node  2  is  the  traitor  and  1  6  £.  Since  both  these  nodes  share  no 
symbols,  we  must  show  that  neither  c  nor  d  can  be  corrupted  by  node  2.  We 
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have 


(A,B,E,F)~(A,B,E,F), 

(2.30) 

(- D,E)~(D,E ), 

(2.31) 

(C,D,P)~(C,D,F), 

(2.32) 

where  (2.30)  follows  because  these  symbols  are  not  touched  by  node  2,  (2.31) 
follows  because  the  comparison  at  node  5  must  succeed,  and  (2.32)  follows 
because  node  1  would  be  discarded  as  a  possible  traitor  if  ( C ,  D,  F )  did  not 
match  at  the  destination.  We  may  apply  Property  1  on  D,  E,  F  to  conclude 
that  ( D,E,F )  ~  (D,E,F),  therefore  d  cannot  be  corrupted.  That  c  cannot 
be  corrupted  follows  from  (2.32). 

(1.3)  Suppose  node  1  is  the  traitor  and  3  6  £.  We  must  show  that  node  1  cannot 

corrupt  a.  We  have  that  ( A,C,D )  ~  (A,C,D),  because  these  three  symbols 
are  not  touched  by  node  3,  and  are  available  at  the  destination.  Since  c  and 
d  determine  the  message,  this  single  constraint  is  enough  to  conclude  that 
node  1  cannot  corrupt  a.  This  illustrates  a  more  general  principle:  when 
considering  the  pair  of  nodes  if  the  number  of  symbols  available  at  the 

destination  untouched  by  both  i  or  j  is  at  least  as  large  as  the  rate  of  the 
code,  we  may  trivially  conclude  that  no  symbols  can  be  corrupted.  In  fact, 
this  principle  works  even  for  finite-field  linear  codes. 

(1.4) :  Follows  exactly  as  (1,3). 

(1.5) :  Follows  exactly  as  (1,3). 

(2.3) :  Follows  exactly  as  (1,2). 

(2.4) :  Proof  above. 

(2.5) :  Follows  exactly  as  (2,5). 
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(3.4) :  Follows  exactly  as  (1,3). 

(3.5) :  Follows  exactly  as  (1,3). 

(4.5) :  Follows  exactly  as  (1,3). 

2.9  The  Polytope  Code 

We  now  describe  the  general  structure  of  Polytope  Codes  and  state  their  important 
properties.  Given  a  matrix  F  G  Zuxm,  consider  the  polytope 

CPfc  =  {x  e  Zm  :  Fx  =  0,  \xi\  <  k  for  i  —  1, . . . ,  m}.  (2.33) 

We  may  also  describe  this  polytope  in  terms  of  a  matrix  iC  whose  columns  form  a 
basis  for  the  null-space  of  F.  Let  X  be  an  m-dimensional  random  vector  uniformly 
distributed  over  IP*..  Take  n  to  be  a  multiple  of  the  least  common  denominator 
of  the  distribution  of  X  and  let  T”(X)  be  the  set  of  sequences  x"  with  joint  type 
exactly  equal  to  this  distribution.  In  a  Polytope  Code,  each  message  is  associated 
with  an  element  of  T”(X).  By  the  theory  of  types,  the  number  of  elements  in  this 
set  is  at  least  2nffhx)-e)  for  any  e  >  0  and  sufficiently  large  n.  Given  a  message 
and  the  corresponding  sequence  x",  each  edge  in  the  network  holds  a  sequence 
xf  for  some  i  =  1, ...  ,m.  As  we  have  seen  in  the  example  Polytope  Codes  in 
Sec.  2.7  and  2.8,  the  joint  entropies  of  p  for  large  k  can  be  calculated  just  from  the 
properties  of  the  linear  subspace  defined  by  F.  The  following  lemma  states  this 
property  in  general. 

Lemma  1  For  any  S  C  {1, . . . ,  m} 

lim  (2.34) 

k^roo  log  k 
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where  K$  is  the  matrix  made  up  of  the  rows  of  K  corresponding  to  the  elements 
ofS. 


Proof:  For  any  S  C  {1  let  7k(Xs)  be  the  projection  of  7k  onto  the 

snbspace  made  np  of  dimensions  S.  The  number  of  elements  in  7k  is  0(/erank(^s->). 
That  is,  there  exist  constants  Ci  and  C2  such  that  for  sufficiently  large  k 

Clk™nk{Ks)  <  |Tfc(X5)|  <  c2krank{I<s).  (2.35) 

For  S  —  {1, ... ,  m},  because  X  is  defined  to  be  uniform  on  7k,  (2.35)  gives 

(2.36) 


,  H(X)  r  log  |  Tfc  | 
hm  - - —  =  lim  — - : —  =  rank  (11 ). 


k — >-oo  log  k  k-> oo  log  k 

Moreover,  by  the  uniform  bound 

H(XS) 


lim 


<  rank(As). 


(2.37) 


oo  log  k 

For  any  S  C  {1, . . . ,  m},  let  T  C  {1, . . . ,  m}  be  a  minimal  set  of  elements  such 
that  rank(A7s'.7’)  =  rank  (id);  i.e.  such  that  Xq,t  completely  specify  X  under  the 
constraint  FX  =  0.  Note  that  rank(AV)  =  rank(A')  —  rank(A'5).  Hence 

H(XS) 

k — yoo  log  k 


lim  H\Xf 

H{  XT\XS) 

(2.38) 

fc-KX>  log  k 

log  k 

Urn  - 

H(Xt) 

(2.39) 

k — ^oo  log  k 

log  k 

rank  (IF)  —  rank(T) 

(2.40) 

rank(A7s)- 

(2.41) 

Combining  (2.37)  with  (2.41)  completes  the  proof 


□ 


Recall  that  in  a  linear  code  operating  over  the  finite  field  F,  we  may  express 
the  elements  on  the  edges  in  a  network  x  e  Fm  as  a  linear  combination  of  the 
message  x  =  Kw,  where  K  is  a  linear  transformation  over  the  finite  held,  and  w 
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is  the  message  vector.  Taking  a  uniform  distribution  on  w  imposes  a  distribution 
on  X  satisfying 

H(Xg)  —  rank(it's)  log  |F|.  (2.42) 

This  differs  from  (2.34)  only  by  a  constant  factor,  and  also  that  (2.34)  holds  only  in 
the  limit  of  large  k.  Hence,  Polytope  Codes  achieve  a  similar  set  of  entropy  profiles 
as  standard  linear  codes.  They  may  not  be  identical,  because  interpreting  a  matrix 
Kg  as  having  integer  values  as  opposed  to  values  from  a  finite  field  may  cause  its 
rank  to  change.  However,  the  rank  when  interpreted  as  having  integer  values  can 
never  be  less  than  when  interpreted  as  having  finite  held  values,  because  any  linear 
equality  on  the  integers  will  hold  on  a  finite  held,  but  not  vice  versa.  The  matrix 
Kg  could  represent,  for  example,  the  source-to-destination  linear  transformation 
in  a  code,  so  its  rank  is  exactly  the  achieved  rate.  Therefore,  in  fact,  the  Polytope 
Code  always  achieves  at  least  as  high  a  rate  as  the  identical  linear  code.  Often, 
when  designing  linear  codes,  the  held  size  must  be  made  sufficiently  large  before 
the  code  works;  here,  sending  k  to  infinity  serves  much  the  same  purpose,  albiet 
in  an  asymptotic  way. 

In  Sec.  2.7  and  2.8,  we  saw  that  Property  1  played  an  important  role  in  the 
functionality  of  the  Polytope  Codes.  The  following  theorem  states  the  more  general 
version  of  this  property.  It  compromises  the  major  property  that  Polytope  Codes 
possess  and  linear  codes  do  not. 

Theorem  5  (Fundamental  Property  of  Polytope  Codes)  Let  X  G  be  a 

random  vector  satisfying  FX  =  0.  Suppose  a  second  random  vector  X  e  Wn 
satisfies  the  following  L  constraints: 

4X~4X/orZ  =  l,...,L  (2.43) 
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where  Ai  G  WaiXm.  The  two  vectors  are  equal  in  distribution  if  the  following  prop¬ 
erties  on  F  and  the  Ai  hold: 

1.  There  exists  a  positive  definite  matrix  C  such  that 

L 

FTCF  =  J2A^iAi  (2.44) 

1=1 

for  some  £;  e  M.UlXUl . 

2.  There  exists  an  l*  and  a  matrix  G*  such  that  FX  =  0  is  equivalent  to  X  = 
G*Ai*X.  for  any  random  vector  X.  This  is  equivalent  to 
column  rank. 

Proof:  The  following  proof  follows  almost  exactly  the  same  argument  as  the 
proof  of  Property  1  in  Sec.  2.7.  We  may  write 


m 

E[(FX)tF(FX)]  =]Te[(^X)tE/(71/X)] 

1=1 

(2-45) 

m 

=  ^E[(HiX)T£/(H/X)] 

i=i 

(2.46) 

=  E[(FX)tF(FX)] 

(2.47) 

=  0 

(2.48) 

where  (2.45)  and  (2.47)  follow  from  (2.44);  (2.46)  follows  from  (2.43),  and  because 
each  term  in  the  sum  involves  A{X.  for  some  /;  and  (2.48)  follows  because  FX  =  0. 
Because  C  is  positive  definite,  we  have  that  FX  =  0.  Therefore,  by  the  second 
property  in  the  statement  of  the  theorem,  X  =  G*Ai*X.  Hence 

X  =  G*A*X 

G*A*X 

X 


A, 


having  full 
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(2.49) 

(2.50) 

(2.51) 
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This  completes  the  proof. 


As  an  example  of  an  application  of  Theorem  5,  we  use  it  to  prove  again 
Property  1  in  Sec.  2.7.  Recall  that  variables  X,Y,Z  e  {—k,...,k}  satisfying 
X  +  Y  +  Z  =  0,  and  the  three  pairwise  distributions  of  X,  Y,  Z  match  as  stated  in 
(2.8)-(2.10).  In  terms  of  the  notation  of  Theorem  5,  we  have  m  =  3,  L  =  3,  and 


F  =  111  , 


(2.52) 


Ai  — 


1  0  0 


0  1  0 


(2.53) 


Ao  — 


1  0  0 


0  0  1 


(2.54) 


A[>  — 


0  1  0 


0  0  1 


(2.55) 


To  satisfy  the  second  condition  of  Theorem  5,  we  may  set  l*  =  1,  since  the 
single  linear  constraint  X  +  Y  +  Z  =  0  implies  that 


1  0 


Y  =  0  1 


(2.56) 


-1  -1 


In  fact,  we  could  just  as  well  have  set  l*  to  2  or  3.  To  verify  the  first  condition,  we 


need  to  check  that  there  exist  E;  for  Z  =  1,2,3  and  a  positive  definite  C  (in  this 


case,  a  positive  scalar,  because  F  has  only  one  row,  so  C  G 


satisfying  (2.44). 


If  we  let 


01,11  01,12 


01,21  01, 22 


(2.57) 
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then,  for  instance, 


A^A!  = 


^  1,11  C  1,12  0 

<71,21  <71,22  0 

0  0  0 


The  right  hand  side  of  (2.44)  expands  to 


l=i 


<71,11  +  <72,11  <71,12 

<71,21  <71,22  +  <73,11 

<72,21  <73,21 


<72,12 

<73,12 

<72,22  +  <73,22 


(2.58) 


(2.59) 


Therefore,  for  suitable  choices  of  {£;}f=1,  we  can  produce  any  matrix  for  the  right 
hand  side  of  (2.44).  We  may  simply  set  C  —  1  and  calculate  the  resulting  matrix 
for  the  left  hand  side,  then  set  {£;}f=1  appropriately.  This  allows  us  to  apply 
Theorem  5  to  conclude  that  (X,  Y,  Z)  ~  (X,  Y,  Z). 


In  our  proof  of  Theorem  4,  we  will  not  use  Theorem  5  in  its  most  general  form. 
Instead,  we  state  three  corollaries  that  will  be  more  convenient.  The  first  is  a 
generalization  of  the  above  argument  for  more  than  three  variables. 


Corollary  1  Let  X  satisfy  FX  =  0  for  some  F  e  Zlxm  with  all  nonzero  values. 
//X  satisfies 

(Xi,  Xj)  ~  (Xj,  Xj)  for  all  =  (2.60) 

(X2,...,Xm)~(X2,...,Xm)  (2.61) 

then  X  ~  X. 


Proof:  We  omit  the  explicit  construction  of  the  Ai  matrices  corresponding  to 

the  conditions  (2.60),  (2.61).  The  second  condition  for  Theorem  5  is  satisfied 
by  (2.61),  since  the  linear  constraint  FX  =  0  determines  X\  given  X2  •  •  ■  Xm.  To 
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Figure  2.7:  The  constraints  on  the  random  vector  X  in  Corollaries  2  (left)  and  3 
(right).  Rectangles  represent  a  constraint  on  the  marginal  distribution  of  all  en¬ 
closed  variables;  lines  represent  pairwise  constraints  on  the  two  connected  variables. 

verify  the  first  condition,  note  that  from  the  conditions  in  (2.60),  we  may  construct 
an  arbitrary  matrix  on  the  right  hand  side  of  (2.44)  for  suitable  {S;}^1.  Therefore 
we  may  simply  set  C  —  1.  □ 

Corollary  1  considers  the  case  with  m  variables  and  m  —  1  degrees  of  freedom; 
i.e.  a  single  linear  constraint.  The  following  corollary  considers  a  case  with  m 
variables  and  m  —  2  degrees  of  freedom. 

Corollary  2  Let  F  G  Z2xm  be  such  that  any  2x2  submatrix  of  F  is  non-singular. 
Let  X  satisfy  FX  =  0.  The  non-singular  condition  on  F  implies  that  any  m  —  2 
variables  specify  the  other  two.  Assume  that  m  >  4,  and  for  convenience  let 
Z  =  (X5, . . . ,  Xm)  and  Z  =  (X5, . . . ,  Xm) .  If  X  satisfies 


(Xi,X2,Z) 

~(Xi,X2,Z), 

(2.62) 

(X3,X4,Z) 

~(X3,X4,Z), 

(2.63) 

(Xi,X3) 

~(*1,*3), 

(2.64) 

(X2,XA) 

~  (x2,x4), 

(2.65) 

(Xi,X4) 

~  (X,,x4) 

(2.66) 

then  X  ~  X.  Fig.  2.7  diagrams  the  constraints  on  X. 
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Proof:  We  prove  Corollary  2  with  two  applications  of  Corollary  1.  First,  con¬ 
sider  the  group  of  variables  [X1X2X/fZ).  These  rn  —  1  variables  are  subject  to  a 
single  linear  constraint,  as  in  Corollary  1.  From  (2.62),  (2.65),  and  (2.66)  we  have 
all  pairwise  marginal  constraints,  satisfying  (2.60).  Furthermore,  (2.62)  satisfies 
(2.61).  We  may  therefore  apply  Corollary  1  to  conclude 

(Ad,X2,Ad,Z)  ~  (Ad,  Ad,  Ad,  Z).  (2.67) 

A  similar  application  of  Corollary  1  using  (2.63),  (2.64),  and  (2.66)  allows  us  to 
conclude 

(Ad,X3,X4,Z)  ~  (Ad,  X3,  Ad,  Z).  (2.68) 

Observe  that  (2.67)  and  (2.68)  share  the  m  variables  (Ad,Ad,Z),  which  together 
determine  X2  and  X3  in  exactly  the  same  way  that  (Ad,  Ad,  Z)  determine  Ad  and 
X3.  Therefore  we  may  combine  (2.67)  and  (2.68)  to  conclude  X  ~  X.  □ 

All  five  constraints  (2.62)-(2.66)  are  not  always  necessary,  and  we  may  some¬ 
times  apply  Theorem  5  without  (2.66).  However,  this  depends  on  an  interesting 
additional  property  of  the  linear  constraint  matrix  F,  as  stated  in  the  third  and 
final  corollary  to  Theorem  5. 

Corollary  3  Let  F  e  Z2xm  be  such  that  any  2x2  submatrix  of  F  is  non-singular, 
and  let  X  satisfy  FX  =  0.  In  addition,  assume 

\Kx4x2z\  \Kx3x4z\  \Kx4x3z |  \Kx2x4z\  <  0  (2.69) 

where  again  K  is  a  basis  for  the  null  space  of  F,  and  A'Xs  for  S  C  {1, ... ,  m}  is 
the  matrix  made  up  of  the  rows  of  K  corresponding  to  the  variables  (. Xf)ies ■  If  X 
satisfies  (2.62)-(2.65)  (Fig.  2.1  diagrams  these  constraints),  then  X  ~  X. 
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Proof:  Either  (2.62)  or  (2.63)  satisfies  the  second  condition  in  Theorem  5.  To 
verify  the  first  condition,  first  let  G  =  AfTnAi.  In  the  four  constraints  (2.62)- 
(2.65),  each  pair  of  variables  appears  together  except  for  (Ad,AT4)  and  (X2,X3). 
Therefore,  for  suitable  choices  of  E/,  we  can  construct  any  G  satisfying  <S'14  = 
G2j 3  =  G3) 2  =  G4)i  =  0.  We  must  show  that  such  a  G  exists  satisfying 

FtCF  =  G  (2.70) 


for  some  positive  definite  C. 


We  build  G  row- by-row.  By  (2.70),  each  row  of  G  is  a  linear  combination  of 
rows  of  F\  i.e.  it  forms  the  coefficients  of  a  linear  equality  constraint  imposed  on 
the  random  vector  X.  Since  G y4,  the  first  row  of  G  represents  a  linear  constraint 
on  the  variables  Xl7  X2,  X3,  Z.  Since  any  m  —  2  variables  specify  the  other  two, 
there  is  exactly  one  linear  equality  constraint  on  these  m  —  1  variables,  up  to  a 
constant.  This  constraint  can  be  written  as 


X\ 

KXl 

X2 

KX2 

X3 

Kx  3 

z 

Az 

since  the  vector  X1,X2,X3,Z  forms  a  linear  combination  of  the 
Kxux 2,x3,z-  Hence,  the  first  row  of  G  is  a  constant  multiple  of  the 


(2.71) 


columns  of 
coefficients 


1,1  —  ot\Kx2x3z\, 

(2.72) 

1,2  =  —  OL  AxjAyZ 

(2.73) 
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in  (2.71).  In  particular, 
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for  some  constant  a.  Since  G 2,3  =  0,  the  second  row  of  G  represents  the  linear 
constraint  on  X2,  X4l  Z.  Using  similar  reasoning  as  above  gives 

G'2,i  =  P\Kx2Xiz\,  (2.74) 

G*2,2  =  —  /3\Xx4x4z\  (2-75) 

for  some  constant  f3.  Moreover,  by  (2.70)  G  is  symmetric,  so  G ij2  =  and  by 

(2.73)  and  (2.74) 

d  =  -hhniho.  (2.76) 

\Kx2x4z\ 

Positive  definiteness  of  C  is  equivalent  to  positive  definiteness  of  the  upper  left 
2x2  block  of  G,  so  the  conditions  we  need  are 


0  <  Ggi  —  a\Kx2x3z\, 

0  <  Gi,iG2)2  -  Gii2G2,i 


=  a 


\Kx2x3z\ 

\Kx4x4z\ 

\Kx4x3z\ 

\Kx2x4  z 

-  I K 


x4x3z\ 


We  may  choose  a  to  trivially  satisfy  (2.77),  and  (2.79)  is  equivalent  to 


(2.77) 

(2.78) 

(2.79) 


I K 


X!X3Z| 


I K 


x2x4z\ 


K 


X2x3z\ 


I K 


XxX4Z\ 


I K 


x2x4z\ 


I K 


X1X3Z 


>  0 


which  may  also  be  written  as  (2.69). 


(2.80) 

□ 


The  necessity  of  satisfying  (2.69)  in  order  to  apply  Theorem  5  substantially 
complicates  code  design.  When  building  a  linear  code,  one  need  only  worry  about 
the  rank  of  certain  matrices;  i.e.  certain  determinants  need  be  nonzero.  Here,  we 
see  that  the  signs  of  these  determinants  may  be  constrained  as  well. 
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2.10  Proof  of  Theorem  4 


To  prove  Theorem  4,  we  need  to  specify  a  Polytope  Code  for  each  network  sat¬ 
isfying  conditions  1-3  in  the  statement  of  the  theorem.  This  involves  specifying 
the  linear  relationships  between  various  symbols  in  the  network,  the  comparisons 
that  are  done  among  them  at  internal  nodes,  and  then  how  the  destination  uses 
the  comparison  information  it  receives  to  decode.  We  then  proceed  to  prove  that 
the  destination  always  decodes  correctly.  The  key  observation  in  the  proof  is  that 
the  important  comparisons  that  go  on  inside  the  network  are  those  that  involve  a 
variable  that  does  not  reach  the  destination.  This  is  because  those  symbols  that 
do  reach  the  destination  can  be  examined  there,  so  further  comparisons  inside 
the  network  do  not  add  anything.  Therefore  we  will  carefully  route  these  non¬ 
destination  symbols  to  maximize  the  utility  of  their  comparisons.  In  particular, 
we  design  these  paths  so  that  for  every  node  having  one  direct  edge  to  the  desti¬ 
nation  and  one  other  output  edge,  the  output  edge  not  going  to  the  destination 
holds  a  non-destination  variable.  The  advantage  of  this  is  that  any  variable,  before 
exiting  the  network,  is  guaranteed  to  cross  a  non-destination  variable  at  a  node 
where  the  two  variables  may  be  compared.  The  existence  of  non-destination  paths 
with  this  property  depends  on  the  planarity  of  the  network.  This  is  described  in 
much  more  detail  in  the  sequel. 

Notation:  For  an  edge  e  G  E,  with  e  =  (i,j),  where  i,j  G  V,  let  head(e)  =  i 
and  tail(e)  =  j.  For  a  node  i  G  V,  let  £in(i)  be  the  set  of  edges  e  with  tail(e)  =  i, 
and  let  £out(*)  be  the  set  of  edges  e  with  head(e)  =  i.  Let  Nin(i)  be  the  set  of  input 
neighbors  of  i;  that  is,  the  set  of  head(e)  for  each  e  G  £in(i).  Similarly,  let  Nout(i) 
be  the  set  of  output  neighbors  of  i.  For  integers  a,  b ,  let  be  the  set  of  nodes 
with  a  inputs  and  b  outputs.  We  will  sometimes  refer  to  such  nodes  as  a-to-b.  For 
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l  G  {1,  2},  let  l  =  2  —  l.  A  path  is  defined  as  an  ordered  list  of  edges  ei,  — ,  e*, 
satisfying  tail(e^)  =  head(e;+i)  for  /  =  1, . . . ,  k  —  1.  The  head  and  tail  of  a  path  are 
defined  as  head(ei)  and  tail(e/,;)  respectively.  A  node  i  is  said  to  reach  a  node  j  if 
there  exists  a  path  with  head  i  and  tail  j.  By  convention,  a  node  can  reach  itself. 

Consider  an  arbitrary  network  satisfying  the  conditions  of  Theorem  4.  By 
condition  (3),  no  node  has  more  output  edges  than  input  edges.  Therefore  the 
min-cut  is  that  between  the  destination  and  the  rest  of  the  network.  Let  M  be  the 
value  of  this  cut;  i.e.,  the  number  of  edges  connected  to  the  destination.  We  now 
state  a  lemma  giving  instances  of  the  cut-set  upper  bound  on  capacity  in  terms  of 
quantities  that  make  the  bound  easier  to  handle  than  Theorem  3  itself.  We  will 
subsequently  show  that  the  minimum  upper  bound  given  by  Lemma  2  is  achievable 
using  a  Polytope  Code;  therefore,  the  cut-set  bound  gives  the  capacity. 

Lemma  2  For  i,j  G  V,  let  diyj  be  the  sum  of  |£in(/c)|  —  |£out(£;)|  for  all  nodes 
k  reachable  from  either  i  or  j ,  not  including  i  or  j .  That  is,  if  k  is  a-to-b,  it 
contributes  a  —  b  to  the  sum.  Recall  that  this  difference  is  always  positive.  Let  Ci 
be  the  total  number  of  output  edges  from  node  i,  and  let  e ;  be  the  number  of  output 
edges  from  node  i  that  go  directly  to  the  destination.  For  any  distinct  pair  of  nodes 
A j 

C<M-eh-ei2.  (2.81) 

Moreover,  if  there  is  no  path  between  i\  and  i2, 

C  <  AT  +  di1:i2  —  Qj  —  cl2.  (2.82) 

Proof:  Applying  Theorem  3  with  A  —  V  \  {-D},  T  =  {A,  A}  immediately  gives 
(2.81).  To  prove  (2.82),  we  apply  Theorem  3  with  T  =  {A,i2},  and 

A  =  {k  G  V  :  k  is  not  reachable  from  A  or  i2}  U  {A,  A}-  (2.83) 
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Observe  that  there  are  no  backwards  edges  for  the  cut  A,  because  any  node  in  Ac  is 
reachable  from  either  or  i2,  so  for  any  edge  (j,  k)  with  j  G  Ac,  k  is  also  reachable 
by  from  i\  or  i2,  so  k  is  also  not  in  A.  Therefore  we  may  apply  Theorem  3.  Since 
all  output  neighbors  of  ii  and  i2  are  not  in  A,  each  output  edge  of  i\  and  i2  crosses 
the  cut.  Hence  (2.4)  becomes 

C  <  \{e  G  E  :  head(e)  G  A,  tail(e)  ^  A}\  —  Ci  —  c2.  (2.84) 

Since  no  node  in  the  network  has  more  output  edges  than  input  edges,  the  difference 
between  the  first  term  in  (2.84) — the  number  of  edges  crossing  the  cut — and  M  is 
exactly  the  sum  of  |£in(A;)|  —  |£out(&)|  f°r  all  k  G  Ac.  Hence 

|{e  G  E  :  head(e)  G  A,  tail(e)  ^  A}\  —  M  =  diui2.  (2.85) 

Combining  (2.84)  with  (2.85)  gives  (2.82).  □ 

Next,  we  show  that  we  may  transform  any  network  satisfying  the  conditions 
of  Theorem  4  into  an  equivalent  one  that  is  planar,  and  made  up  of  just  2-to-2 
nodes  and  2-to-l  nodes.  We  will  go  on  to  show  that  the  upper  bound  provided  by 
Lemma  2  is  achievable  for  any  such  network,  so  it  will  be  enough  to  prove  that  a 
transformation  exists  that  preserves  planarity,  does  not  reduce  capacity,  and  does 
not  change  the  bound  given  by  Lemma  2. 

We  first  replace  any  a-to-b  node  i  with  a  cascade  of  a  — b  2-to-l  nodes  followed 
by  a  b-to-b  node.  This  transformation  is  illustrated  in  Fig.  2.8.  Denote  the  b-to-b 
node  in  the  transformation  i*.  Since  no  node  in  the  original  network  has  more  than 
two  output  edges,  the  resulting  network  contains  only  1-to-l  nodes,  2-to-2  nodes, 
and  2-to-l  nodes.  We  will  shortly  argue  that  the  1-to-l  nodes  may  be  removed 
as  well.  Certainly  these  transformations  maintain  the  planarity  of  the  network. 
Moreover,  any  rate  achievable  on  the  transformed  network  is  also  achievable  on 
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Figure  2.8:  An  illustration  of  the  transformation  from  a  4-to-2  node  to  an  equiva¬ 
lent  set  of  2-to-l  and  2-to-2  nodes. 

the  original  network.  This  is  because  if  node  i  is  transformed  via  this  operation 
into  several  nodes,  any  coding  operation  performed  by  these  nodes  can  certainly 
be  performed  by  node  i.  Additionally,  the  traitor  taking  control  of  node  i  in  the 
original  network  does  exactly  as  much  damage  as  the  traitor  taking  control  of  i* 
in  the  transformed  network,  since  it  controls  all  edges  sent  to  other  nodes.  Now 
consider  the  minimum  upper  bound  given  by  Lemma  2  after  this  transformation. 
The  only  nodes  with  positive  ej  values  will  be  i*  nodes,  and  e;*  =  e*.  Hence 
(2.81)  cannot  change.  In  (2.82),  if  we  take  i\  and  i*2,  then  the  bound  is  the  same 
in  the  transformed  network.  Taking  one  of  the  2-to-l  nodes  instead  of  a  i*  node 
cannot  result  in  a  lower  bound,  because  they  have  no  more  output  edges,  so  no 
higher  c  values,  and  no  fewer  reachable  nodes  with  fewer  outputs  than  inputs,  so  no 
smaller  d  values.  Therefore,  the  minimal  bound  given  by  (2.82)  for  the  transformed 
network  is  the  same  as  that  of  the  original  network.  Moreover,  in  the  transformed 
network  djlji2  is  equal  simply  to  the  number  of  2-to-l  nodes  reachable  from  i\  or 
%2  not  including  ii,i2. 

We  may  additionally  transform  the  network  to  remove  1-to-l  nodes,  simply 
be  replacing  the  node  and  the  two  edges  connected  to  it  by  a  single  edge.  The 
traitor  can  always  take  over  the  preceding  or  subsequent  node  and  have  at  least 
as  much  power.  The  only  exception  is  when  the  1-to-l  node  is  connected  only  to 
the  source  and  destination.  In  this  case,  instead  of  removing  the  node,  we  may 
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add  a  additional  edge  to  it  from  the  source,  turning  it  into  a  2-to-l  node.  Such 
a  transformation  does  not  change  the  capacity,  nor  the  planarity  or  the  Lemma  2 
bounds. 

We  also  assume  without  loss  of  generality  that  all  nodes  in  the  network  are 
reachable  from  the  source.  Certainly  edges  out  of  these  nodes  cannot  carry  any 
information  about  the  message,  so  we  may  simply  discard  this  portion  of  the 
network,  if  it  exists,  without  changing  the  capacity. 

We  will  show  that  the  smallest  bound  given  by  Lemma  2  is  achievable  using  a 
Polytope  Code.  If  we  take  i\  and  i2  to  be  two  nodes  with  at  least  one  direct  link  to 
the  destination,  (2.81)  gives  that  the  capacity  is  no  more  than  M  —  2.  Moreover, 
since  e,  <  q  <  2  for  any  node  i,  neither  (2.81)  nor  (2.82)  can  produce  a  bound  less 
than  M  —  4.  Therefore  the  minimum  bound  given  by  Lemma  2  can  take  on  only 
three  possible  values:  M  —  4,  M  —  3,  M  —  2.  It  is  not  hard  to  see  that  M  —  4  is 
trivial  achievable;  indeed,  even  with  a  linear  code.  Therefore  the  only  interesting 
cases  are  when  the  cut-set  bound  is  M  —  3  or  M  —  2.  We  begin  with  the  latter, 
because  the  proof  is  more  involved,  and  contains  all  the  necessary  parts  to  prove 
the  M  —  3  case.  The  M  —  3  proof  is  subsequently  given  in  Section  2.10.5. 

Assume  that  the  right  hand  sides  of  (2.81)  and  (2.82)  are  never  smaller  than 
M  —  2.  We  describe  the  construction  of  the  Polytope  Code  to  achieve  rate  M  —  2 
in  several  steps.  The  correctness  of  the  code  will  be  proved  in  Lemmas  3-6,  which 
are  stated  during  the  description  of  the  construction  process.  These  Lemmas  are 
then  proved  in  Sections  2.10.1-2.10.4. 

1 )  Edge  Labeling:  We  first  label  all  the  edges  in  the  network  except  those  in 


74 


91 


£;„(/}).  These  labels  are  denoted  by  the  following  functions 

0  :  E  \  £in(-D)  — >  V24  (2.86) 

^  E\  £in(.D)  — >■  {0, 1}.  (2.87) 

For  a  2-to-l  node  v,  let  A(w)  be  the  set  of  edges  e  with  0(e)  =  n.  The  set 
A(u)  represents  the  edges  carrying  symbols  that  interact  with  the  non-destination 
symbol  that  terminates  at  node  v.  The  set  of  edges  with  0(e)  =  v  and  0(e)  =  1 
represent  the  path  taken  by  the  non-destination  symbol  that  terminates  at  node 
v.  The  following  Lemma  states  the  existence  of  labels  0, 0  with  the  necessary 
properties. 

Lemma  3  There  exist  functions  0  and  0  with  the  following  properties: 

A  The  set  of  edges  e  with  0(e)  =  v  and  0(e)  =  1  form  a  path. 

B  If  0(e)  =  v,  then  either  tail(e)  =  v  or  there  is  an  edge  e!  with  head(e/)  =  tail(e) 

and  0(e')  =  v. 

C  For  every  2-to-2  node  i  with  output  edges  ei,  e2;  either  0(ei)  =  1,  0(e2)  =  1,  or 

0(ei)  0  0(e2). 

Note  that  if  property  (B)  holds,  A(v)  is  a  union  of  paths  ending  at  v.  From  property 
(A),  the  edges  on  one  of  these  paths  satisfy  0(e)  =  1. 

2)  Internal  Node  Operation:  Assume  that  0  and  0  are  defined  to  satisfy  prop¬ 
erties  (A)-(C)  in  Lemma  3.  Given  these  labels,  we  will  specify  how  internal  nodes 
in  the  network  operate.  Every  edge  in  the  network  will  hold  a  symbol  representing 
a  linear  combination  of  the  message,  as  well  as  possibly  some  comparison  bits.  We 
also  define  a  function 

p:E^  |£out(S')|}  (2.88) 
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that  will  serve  as  an  accounting  tool  to  track  symbols  as  they  pass  through  the 
network.  We  begin  by  assigning  distinct  and  arbitrary  values  to  p(e)  for  all  e  G 
£0ut(<S')  (p  therefore  constitutes  an  ordering  on  £0ut(<S'))-  Further  assignments  of  p 
will  be  made  recursively.  This  will  be  made  explicit  below,  but  if  a  symbol  is  merely 
forwarded,  it  travels  along  edges  with  a  constant  p.  When  linear  combinations 
occur  at  internal  nodes,  p  values  are  manipulated,  and  p  determine  exactly  how 
this  is  done. 

For  every  node  i  with  2  input  edges,  let  /i,/2  be  these  edges.  If  i  is  2-to- 
2,  let  ei,e2  be  its  two  output  edges;  if  it  is  2-to-l,  let  e  be  its  output  edge.  If 
(p(f\)  =  c/>(/2),  then  node  i  compares  the  symbols  on  f\  and  /2.  If  node  i  is  2-to-2, 
then  0(e/)  =  4>(fi)  for  either  l  =  1  or  2.  Node  i  transmits  its  comparison  bit  on 
6;.  If  node  i  is  2-to-l,  then  it  transmits  its  comparison  bit  on  e.  All  2-to-2  nodes 
forward  all  received  comparison  bits  on  the  output  edge  with  the  same  (j)  value  as 
the  input  edge  on  which  the  bit  was  received.  All  2-to-l  nodes  forward  all  received 
comparison  bits  on  its  output  edge. 

We  divide  nodes  in  V2i2  into  the  following  sets.  The  linear  transformation 
performed  at  node  i  will  depend  on  which  of  these  sets  it  is  in. 

Wi  =  {i  G  V2i2  :  ^(/i)  =  ^{fz)  =  0,  0(/i)  ^  <K/2)}  (2.89) 

W2  =  {i  G  V2i2  :  ^(/i)  =  f-2 )  =  0,  0(/2)  =  </>(/2)}  (2-90) 

W3  =  {i  G  V2i2  :  V^(/i)  =  1  or  ^(/2)  =  1}  (2.91) 

We  will  sometimes  refer  to  nodes  in  W2  as  branch  nodes,  since  they  represent 

branches  in  A(0(/i)).  Moreover,  branch  nodes  are  significant  because  a  failed 
comparison  at  a  branch  node  will  cause  the  forwarding  pattern  within  A (0(/]_))  to 
change.  For  an  edge  e,  Xe  denotes  the  symbol  transmitted  on  e.  The  following 
gives  the  relationships  between  these  symbols,  which  are  determined  by  internal 
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nodes,  depending  partially  on  the  comparison  bits  they  receive.  For  each  node  i, 
the  action  of  node  i  depends  on  which  set  it  falls  in  as  follows: 

•  Wy  Let  /  be  such  that  4>(ei)  =  4>(fi).  The  symbol  on  f\  is  forwarded 
to  ei,  and  the  symbol  on  /2  is  forwarded  onto  ej.  Set  p(ei )  =  p(/i),  and 

p(er)  =  ptfz)- 

•  W2:  Let  /  be  such  that  0(e/)  =  4>{fi)  —  Let  l'  be  such  that  p(fv)  < 

p(f[>).  We  will  show  in  Lemma  4  that  our  construction  is  such  that  p{fi)  ^ 
p(/2)  at  all  nodes,  so  V  is  well  defined.  If  neither  fi  nor  /2  hold  a  failed 
comparison  bit,  the  output  symbols  are 

Xei  =  +  TfipXfc  (2.92) 

XeT  =  Xfv  (2.93) 

where  coefficients  7^1, 7^2  are  nonzero  integers  to  be  chosen  later.  Set  output 
p  values  to 

p(ei)  =  p(fr)  (2.94) 

p(ei)  =  p{fi').  (2-95) 

Note  that  the  symbol  on  the  input  edge  with  smaller  p  value  is  forwarded 

without  linear  combination.  If  the  input  edge  f)/  reports  a  failed  comparison 
anywhere  previously  in  A then  (2.93)  changes  to 

Xej  —  Xfp  ■  (2.96) 

•  W3:  Let  l  be  such  that  ij}(fi)  =  1,  and  l'  be  such  that  ^(e;/)  =  1  and 

4>(ei')  =  The  symbol  on  fi  is  forwarded  to  e^,  and  the  symbol  on 

fj  is  forwarded  to  ep,  with  the  following  exception.  If  —  ^(h)  and 

there  is  a  failed  comparison  bit  sent  from  fp  then  the  forwarding  swaps:  the 
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symbol  on  fi  is  forwarded  to  ep,  and  the  symbol  on  /p  is  forwarded  to  ep. 
Set  p(ep)  =  p(fi)  and  p(ep)  =  p(/p).  Again,  p  is  consistent  along  forwarded 
symbols,  but  only  when  all  comparisons  succeed. 

•  V2.i :  Let  l  be  such  that  fi )  =  1.  The  symbol  from  /p  is  forwarded  on  e, 

unless  there  is  a  failed  comparison  bit  sent  from  /p,  in  which  case  the  symbol 
from  fi  is  forwarded  on  e.  Set  p(e)  =  p(/p). 

See  Fig.  2.9  for  an  illustration  of  the  linear  transformations  performed  at  internal 
nodes  and  how  they  change  when  a  comparison  fails.  The  following  Lemma  gives 
some  properties  of  the  internal  network  behavior  as  prescribed  above. 

Lemma  4  The  following  hold: 

1.  For  any  integer  a  G  {1, . . . ,  |£out(S')|}7  the  set  of  edges  with  e  with  p(e)  =  a 
form  a  path  (we  refer  to  this  in  the  sequel  as  the  p  =  a  path,).  Consequently, 
there  is  no  node  i  with  input  edges  /i,  /2  such  that  p(/i)  =  p(/ 2). 

2.  If  there  are  no  failed  comparisons  that  occur  in  the  network,  then  the  lin¬ 
ear  transformations  are  such  that  the  decoder  can  decode  any  symbol  in  the 
network  except  those  on  non- destination  paths. 

3.  Suppose  a  comparison  fails  at  a  branch  node  k  with  input  edges  /1,  /2  with 
v  =  4>(fi)  =  0(/2).  Assume  without  lack  of  generality  that  p(/i)  <  p(/2 ). 
The  forwarding  pattern  within  A(v)  changes  such  that  symbols  sent  along 
the  p  =  p(/2)  path  are  not  decodable  at  the  destination,  but  what  was  the 
non- destination  symbol  associated  with  v  is  decodable. 

3)  MDS  Code  Construction:  The  rules  above  explain  how  the  symbols  are 
combined  and  transformed  inside  the  network.  In  addition,  when  the  initial  set  of 
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Figure  2.9:  An  example  of  the  linear  transformations  performed  in  A(v)  for  some 
v  (labeled  as  such).  Solid  edges  denote  0(e)  =  v,  dashed  edges  denote  0(e)  ^  v. 
Thick  edges  denote  0(e)  =  1.  Near  the  head  of  each  edge  is  the  corresponding  p 
value.  Also  shown  is  the  symbol  transmitted  along  that  edge,  given  initial  symbols 
a-i  at  the  furthest  upstream  edges  in  the  network.  When  several  symbols  are 
written  on  an  edge,  this  indicates  that  the  edge  carries  a  linear  combination  of 
those  symbols.  The  symbols  indicated  in  brackets  are  those  carried  by  the  edges 
when  the  comparison  at  the  indicated  black  node  fails.  Symbols  on  edges  labeled 
without  brackets  do  not  change  when  the  comparison  fails. 


symbols  are  sent  into  the  network  from  the  source,  they  are  subject  to  linear  con¬ 
straints.  We  now  describe  exactly  how  this  is  done.  Assume  that  no  comparisons 
fail  in  the  network,  so  the  linear  relationships  between  symbols  are  unmodified. 
For  a  2-to-l  node  v,  let  e*  be  the  edge  with  0(e*)  =  v ,  0(e*)  =  1,  and  tail(e*)  =  v; 
i.e.  it  is  the  last  edge  to  hold  the  non-destination  symbol  terminating  at  v.  Ob¬ 
serve  that  it  will  be  enough  to  specify  the  linear  relationships  among  the  symbols 
on  {e*  :  v  G  V2)i}  as  well  as  the  M  edges  in  £in(.D).  These  collectively  form  the 
Polytope  Code  equivalent  of  a  (M  +  |V2,i  | ,  M  —  2)  MDS  code.  We  must  construct 
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this  code  so  as  to  satisfy  certain  instances  of  (2.69),  so  that  we  may  apply  The¬ 
orem  5  as  necessary.  The  following  Lemma  states  the  existence  of  a  set  of  linear 
relationships  among  the  M  +  |V2ii|  variables  with  the  required  properties. 


Lemma  5  For  each  2-to-l  node  v,  let  S(v)  be  the  set  of  edges  e  with  tail(e)  =  D 
such  that  there  is  an  edge  e'  with  tail(e')  =  head(e),  4>(e')  =  v,  and  i/j(e')  =  1.  That 
is,  the  symbol  on  e,  just  before  being  sent  to  the  destination,  was  compared  against 
the  non- destination  symbol  associated  with  v.  Note  that  any  edge  e  G  Fin(D)  is 
contained  in  H(i>)  for  some  2-to-l  node  v.  There  exists  a  generator  matrix  K  G 
ZM+lv2’ilxM— 2  where  each  row  is  associated  with  an  edge  in  {e*  :  v  G  V2jl}U£in(.D) 
such  that  for  all  vi,v2  G  V2>i  and  all  f\  G  S(iq),  f2  G  S(u2),  the  constraints 


imply 


where 


(Xh,Xh,Z)~(Xh,Xh,Z) 

(2.97) 

(y  V  X, V  Z)  ~  (XeV  X,v  Z) 

(2.98) 

(Xh,Xe.J~(Xh,Xe.J 

(2.99) 

(Xh,X'.J~(Xh,Xe.J 

(2.100) 

(xh,xh,xr,t,xr,Z)  ~  (xh,xh,xeZt,xe;z) 

(2.101) 

Z  =  (Xe :  e  e  £i„(D)  \  {/,,  f2}). 

(2.102) 

4)  Decoding  Procedure:  To  decode,  the  destination  first  compiles  a  list  L  C  V 
of  which  nodes  may  be  the  traitor.  It  does  this  by  taking  all  its  available  data: 
received  comparison  bits  from  interior  nodes  as  well  as  the  symbols  it  has  direct 
access  to,  and  determines  whether  it  is  possible  for  each  node,  if  it  were  the  traitor, 
to  have  acted  in  a  way  to  cause  these  data  to  occur.  If  so,  it  adds  this  node  to 
L.  For  each  node  i,  let  K,  be  the  linear  transformation  from  the  message  vector 
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W  to  the  symbols  on  the  output  edges  of  node  i.  With  a  slight  abuse  of  notation, 
regard  KD  represent  the  symbols  on  the  input  edges  to  D  instead.  For  a  set  of 
nodes  S  C  V,  let  Kd±s  be  a  basis  for  the  subspace  spanned  by  Kp  orthogonal  to 

nspan(i^).  (2.103) 

ieS 

The  destination  decodes  from  KD _lcW.  If  i  is  the  traitor,  it  must  be  that  i  e  £, 
so 

rank(iW>_Lc)  >  M  —  dim  span(iT,)^  (2.104) 

>  M  —  rank [Kf)  (2.105) 

>  M  -  2  (2.106) 

where  we  used  the  fact  that  node  i  has  at  most  two  output  edges.  Since  Kp ±l  has 
rank  at  least  M  —  2,  this  is  a  large  enough  space  for  the  destination  to  decode  the 
entire  message.  The  follow  Lemma  allows  us  to  conclude  that  all  variables  in  the 
subspace  spanned  by  Kd_ lc  are  trustworthy. 

Lemma  6  Consider  any  pair  of  nodes  i,j.  Suppose  i  is  the  traitor,  and  acts  in  a 
way  such  that  j  £  £.  Node  i  cannot  have  corrupted  any  value  in  Kd±^ j}W. 

2.10.1  Proof  of  Lemma  3 

We  begin  with  0(e)  =  0(e)  =  0  for  all  edges  e,  and  set  <f>  and  0  progressively.  First 
we  describe  some  properties  of  the  graph  (V,  E)  imposed  by  the  fact  that  the  right 
hand  sides  of  (2.81)  and  (2.82)  are  never  less  than  M  —  2. 

Given  a  2-to-l  node  v,  let  r,;  be  the  set  of  nodes  for  which  v  is  the  only  reachable 
2-to-l  node.  Note  that  other  than  v,  the  only  nodes  in  r„  are  2-to-2.  Moreover, 
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if  v  can  reach  another  2-to-l  node,  T,,  is  empty.  We  claim  that  T,,  forms  a  path. 
If  it  did  not,  then  there  would  be  two  2-to-2  nodes  ii,i2  G  for  which  there  is 
no  path  between  them.  That  is,  diui2  =  1  and  ctl  =  Ci2  =  2,  so  (2.82)  becomes 
C  <  M  —  3,  which  contradicts  our  assumption  that  the  cut-set  bound  is  M  —  2. 

Furthermore,  every  2-to-2  node  must  be  able  to  reach  at  least  one  2-to-l  node. 
If  not,  then  we  could  follow  a  path  from  such  a  2-to-2  node  until  reaching  a  node 
i\  all  of  whose  output  edges  lead  directly  to  the  destination.  Node  i\  cannot  be 
2-to-l,  so  it  must  be  2-to-2,  meaning  en  =  2.  Taking  any  other  node  i2  with  a 
direct  link  to  the  destination  gives  no  more  than  M  —  3  for  the  right  hand  side  of 
(2.81),  again  contradicting  our  assumption. 

The  first  step  in  the  edge  labeling  procedure  is  to  specify  the  edges  holding 
non-destination  symbols;  that  is,  for  each  2-to-l  node  v ,  to  specify  the  edges  e 
for  which  0(e)  =  v  and  0(e)  =  1.  To  satisfy  property  (A),  these  must  form  a 
path.  For  any  node  i  G  Nin(H),  the  output  edge  of  i  that  goes  to  the  destination 
has  no  0  value,  so  to  satisfy  property  (C),  the  other  output  edge  e  must  satisfy 
0(e)  =  1.  Moreover,  by  property  (B),  if  0(e)  =  v,  then  there  is  a  path  from 
head(e)  to  v.  Hence,  if  i  G  V2  2  D  T,,  for  some  2-to-l  node  v,  then  it  is  impossible 
for  the  two  output  edges  of  i  to  have  different  0  values;  hence,  by  property  (C), 
one  of  its  output  edges  e  must  satisfy  0(e)  =  1.  Therefore,  we  need  to  design  the 
non-destination  paths  so  that  they  pass  through  T,,  for  each  v,  as  well  as  each  node 
in  'Nin(D). 

For  each  2-to-l  node  v,  we  first  set  the  end  of  the  non- destination  path  associ¬ 
ated  with  v  to  be  the  edges  in  r„.  That  is,  for  an  edge  e,  if  head(e),  tail(e)  G  T,,, 
set  0(e)  =  1  and  0(e)  =  v.  Now  our  only  task  is  to  extend  the  paths  backwards 
such  that  one  is  guaranteed  to  pass  through  each  node  in  Nin(H). 
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Figure  2.10:  A  diagram  of  the  planar  embedding  being  used  to  prove  that  a  node 
k  G  Afin(D)  on  the  interior  of  GhJ  is  reachable  from  i.  Solid  lines  are  single  edges; 
dashed  lines  represent  paths  made  np  of  possibly  many  edges.  Thick  lines  corre¬ 
spond  to  edges  in  Cy. 

Construct  an  embedding  of  the  graph  (V,  E)  in  the  plane  such  that  S  is  on 
the  exterior  face.  Such  an  embedding  always  exists  [90].  If  we  select  a  set  of 
edges  making  up  an  undirected  cycle — that  is,  edges  constituting  a  cycle  on  the 
underlying  undirected  graph — then  all  nodes  in  the  network  not  on  the  cycle  are 
divided  into  those  on  the  interior  and  those  on  the  exterior,  according  to  the 
planar  embedding.  Take  i,j  G  Nin(.D)  such  that  %  can  reach  j,  and  let  Gh]  be  the 
undirected  cycle  composed  of  a  path  from  i  to  j,  in  addition  to  the  edges  (i,  D) 
and  (j,  D).  We  claim  that  if  a  node  k  G  Nin(.D)  is  on  the  interior  of  G,.j,  then  it  is 
reachable  from  i.  Since  S  is  on  the  exterior  face  of  the  graph,  it  must  be  exterior 
to  the  cycle  G%r  There  exists  some  path  from  S  to  k,  so  it  must  cross  the  G,^  at 
a  node  j'.  Observe  that  j'  must  be  on  the  path  from  i  to  j,  so  it  is  reachable  from 
i.  Therefore  i  can  reach  j'  and  j'  can  reach  k ,  so  i  can  reach  k.  This  construction 
is  diagrammed  in  Fig.  2.10. 

We  may  travel  around  node  D  in  the  planar  embedding,  noting  the  order 
in  which  the  nodes  Nin(.D)  connect  to  D.  Call  this  order  Ui, . . .  ,um ■  Take  any 
i  G  nm{D),  and  suppose  i  =  ui.  We  claim  that  the  set  of  nodes  in  3sfin(£))  reachable 
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from  ui  forms  a  contiguous  block  around  Ui  in  the  {n}  ordering,  where  we  regard 
Mi  and  um  as  being  adjacent,  so  two  contiguous  blocks  containing  v,\  and  um  is 
considered  one  contiguous  block. 

Suppose  this  were  not  true.  That  is,  for  some  i  G  Kin(D)  there  exists  a  j  G 
Kin(D)  reachable  from  i  that  is  flanked  on  either  side  in  the  {«}  ordering  by  nodes 
k\ ,  k-2  G  Tsfin (-0)  not  reachable  from  i.  The  order  in  which  these  four  nodes  appear 
in  {w}  in  some  cyclic  permutation  or  reflection  of 

(2.107) 

Neither  k\  nor  k^  can  be  on  the  interior  of  Ghj,  because,  as  shown  above,  any  such 
node  is  reachable  from  i.  However,  if  they  are  both  on  the  exterior,  then  the  order 
in  (2.107)  cannot  occur,  because  D  is  on  the  boundary  of  Gitj. 

By  contiguity,  if  a  node  i  G  lNfin(T>)  can  reach  any  other  node  in  Nin(Z?),  it 
can  reach  a  node  immediately  adjacent  to  it  in  the  {w}  ordering.  Suppose  i  can 
reach  both  the  node  j\  G  Nin(.D)  immediately  to  its  left  and  the  node  j2  £  Nin(.D) 
immediately  to  its  right.  We  show  that  in  fact  i  can  reach  every  node  in  Nin(.D). 
In  particular,  there  can  be  only  one  such  node,  or  else  there  would  be  a  cycle. 
Node  i  has  only  two  output  edges,  one  of  which  goes  directly  to  D.  Let  %'  be  the 
tail  of  the  other.  Both  j\  and  j2  must  be  reachable  from  i' . 

We  claim  it  is  impossible  for  both  j\  to  be  exterior  to  Ghj2  and  j2  to  be  exterior 
to  C ijj..  Suppose  both  were  true.  We  show  the  graph  must  contain  a  cycle.  Let  G 
be  the  undirected  cycle  composed  of  the  path  from  i'  to  j i,  the  path  from  %'  to  j2, 
and  the  edges  (j i ,  D),  (j2,  D).  Every  node  on  G  is  reachable  from  i.  Since  both  j\ 
is  exterior  to  Gi  n  and  j2  is  exterior  to  QhJ1 ,  it  is  easy  to  see  that  i  must  be  on  the 
interior  of  G.  Therefore  any  path  from  S  to  %  must  cross  the  cycle  at  a  node  k ', 
reachable  from  i.  Since  k'  is  on  a  path  from  S  to  k\  i  is  also  reachable  from  k',  so 
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Figure  2.11:  A  diagram  of  the  planar  embedding  being  used  to  prove  that  a  node 
reaching  its  two  neighbors  in  Nin(.D)  can  reach  every  node  in  Nin(.D).  Solid  lines 
are  single  edges;  dashed  lines  represent  paths  made  up  of  possibly  many  edges. 
Thick  lines  correspond  to  the  undirected  cycle  C.  Undirected  cycles  Gij1  and 
are  indicated. 

there  is  a  cycle.  See  Fig.  2.11  for  a  diagram  of  this. 

Therefore,  we  may  assume  without  loss  of  generality  that  A  is  in  the  interior  of 
Cjjj.  Suppose  there  were  a  node  j3  G  Nin(.D)  not  reachable  from  i.  Node  j3  must 
be  on  the  exterior  of  Gh]1 ,  because  we  have  shown  that  nodes  in  l\fin(.D)  on  the 
interior  are  reachable  from  i.  Therefore,  in  the  {w}  order,  these  four  nodes  rnnst 
appear  in  some  cyclic  permutation  or  reflection  of  (i,  j3,  ji,  jz)-  However,  this  is 
impossible,  because  both  j\  and  A  were  assumed  to  be  adjacent  to  i .  Therefore,  i 
can  reach  every  node  in  Nin(H). 

Take  a  node  i  that  can  reach  2-to-l  nodes  v3,v2  G  Nin(.D).  Suppose  that  i 
cannot  reach  every  node  in  Nin(Zl).  Therefore,  the  nodes  it  can  reach  in  in  l\fin(Zl) 
are  either  entirely  to  its  right  or  entirely  to  its  left  in  the  {«}  ordering,  or  else, 
by  contiguity,  node  i  would  be  able  to  reach  the  adjacent  nodes  on  both  sides. 
Suppose  without  loss  of  generality  that  they  are  all  to  its  right,  and  that  v2  is 
further  to  the  right  than  v3.  We  claim  that  V\  is  on  the  interior  of  C„,2 .  Suppose  it 
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were  on  the  exterior.  By  contiguity,  every  node  in  K;n(£))  on  the  exterior  of  Cv,2 
must  be  reachable  from  i.  Since  we  have  already  argued  that  every  node  in  3\f;n(D) 
on  the  interior  of  C,.,J2  is  reachable  from  i,  this  means  %  can  reach  every  node  in 
l\fin(.D),  which  we  have  assumed  is  not  the  case. 

Therefore,  V\  is  on  the  interior  of  C ijt)2.  We  may  construct  a  path  from  S  to  tq, 
passing  through  all  nodes  in  F,,, .  This  path  must  cross  C,iV2  at  a  node  k,  reachable 
from  i.  Node  j  can  reach  both  V\  and  u2,  so  it  cannot  be  in  TW1.  However,  j  is  on 
a  path  passing  through  TWl ,  so  it  can  reach  all  nodes  in  TWl .  Therefore  there  exists 
a  path  from  i  to  iq,  passing  through  TV1. 

If  i  can  reach  every  node  in  3\Tjn(.D),  then  as  shown  above,  either  v\  is  in  the 
interior  of  Gjm,  or  v2  is  in  the  interior  of  C ^V2.  Therefore,  by  the  same  argument 
to  that  just  used  for  the  case  that  i  cannot  reach  every  node  in  Nin(D),  there  is 
either  a  path  from  i  to  V\  through  r,;i  or  a  path  from  i  to  u2  through  T.„2 . 

Fix  a  2-to-l  node  V\  G  3\fin(T>).  Consider  the  set  of  nodes  that  are: 

•  contained  in  V2j2  fl  Nin(D), 

•  not  in  T,,  for  any  2-to-l  node  v, 

•  can  reach  v\ , 

•  cannot  reach  any  other  node  also  satisfying  the  above  three  conditions. 

We  claim  there  are  at  most  two  such  nodes.  Suppose  there  were  two  such  nodes 
ii,i2  both  to  the  left  of  V\  in  the  {«}  ordering.  If  i\  were  further  to  the  left,  then 
i\  could  reach  i2,  since  i\  can  reach  V\  and  the  nodes  reachable  from  i\  must  form 
a  contiguous  block.  Hence  i\  would  not  qualify.  Therefore  there  can  be  at  most 
one  such  node  to  the  left  of  V\  and  at  most  one  to  the  right.  Denote  these  two 
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nodes  i  and  j  respectively,  if  they  exist.  By  contiguity,  every  node  satisfying  the 
first  three  conditions  must  be  able  to  reach  either  i  or  j.  Moreover,  all  such  nodes 
to  the  left  of  V\  form  a  single  path  ending  in  i,  and  those  on  the  right  form  a  single 
path  ending  in  j.  We  will  proceed  to  extend  two  non-destination  paths  backwards 
to  %  and  j.  Then,  we  may  further  extend  these  two  paths  backwards  through  all 
nodes  in  V2)2  H  3\fjn(D)  that  can  reach  v\,  and  then  backwards  to  the  source  on 
arbitrary  paths.  Hence,  we  need  only  find  paths  from  %  to  the  head  of  T,,  for  some 
v,  and  a  distinct  one  of  the  same  for  j. 

Both  i  and  j  can  reach  at  least  one  2-to-l  node  other  than  v\.  Suppose  i  can 
reach  another  2-to-l  node  v2  G  l\fin(.D).  By  the  argument  above,  there  is  a  path 
from  i  to  the  leftmost  of  v\,v2  through  T„1  or  T„2  respectively.  Similarly,  if  j 
can  reach  a  2-to-l  node  v:i  G  3\fin(.D)  with  V3  ^  U],  there  is  a  path  from  j  to  the 
rightmost  of  v\,  V3,  through  the  associated  T.  This  is  true  even  if  v2  =  V3. 

Suppose  there  is  no  2-to-l  node  in  3\fin(.D)  reachable  from  node  i  other  than 
V\.  There  still  must  be  a  2-to-l  node  v2  reachable  from  i,  though  v2  3\fin(H). 
Since  v2  is  not  adjacent  to  the  destination,  it  must  be  able  to  reach  a  2-to-l  node 
that  is.  Therefore  T„2  =  0,  so  any  path  from  i  to  v2  trivially  includes  T„2.  If  j 
can  also  reach  no  2-to-l  nodes  in  Nin(.D)  other  than  v\,  there  must  be  some  2-to-l 
node  V3  1\T \n{D)  reachable  from  j.  We  may  therefore  select  non-destination  paths 
from  i  to  v2  and  j  to  V3,  unless  v2  =  V3.  This  only  occurs  if  this  single  node  is 
the  only  2-to-l  node  other  than  v\  reachable  by  either  i  or  j.  We  claim  that  in 
this  case,  either  i  or  j  can  reach  the  tail  of  TV1.  Therefore  we  may  extend  the 
non-destination  path  for  V\  back  to  one  of  i  or  j,  and  the  non-destination  path 
for  v2  =  V3  to  the  other.  Every  node  can  reach  some  2-to-l  node  in  l\fin(D),  so 
v2  can  reach  v\,  or  else  i  and  j  would  be  able  to  reach  a  different  2-to-l  node  in 
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Nin(-D)-  By  a  similar  argument  to  that  used  above,  v\  must  be  on  the  interior  of 
the  undirected  cycle  composed  of  the  path  from  i  to  v2,  the  path  from  j  to  v2, 
and  the  edges  ( i,D ),  ( j,D ).  If  not,  v\  would  not  be  between  i  and  j  in  the  {u} 
ordering.  Note  this  is  true  even  if  i  can  reach  j  or  vice  versa.  Since  S  must  be 
exterior  to  this  cycle,  any  path  from  S  to  V\  including  I\);l  must  cross  either  the 
path  from  i  to  v2  or  j  to  v2  at  a  node  k.  Node  k  must  be  able  to  reach  the  head 
of  TV1 ,  so  either  i  or  j  can  reach  rvi . 

Once  the  non-destination  paths  are  defined,  we  perform  the  following  algorithm 
to  label  other  edges  so  as  to  satisfy  property  (C).  We  refer  to  an  edge  e  as  labeled 
if  0(e)  ^  0.  We  refer  to  a  node  as  labeled  if  any  of  its  output  edges  are  labeled. 
Any  node  unlabeled  after  the  specifications  of  the  non-destination  paths  must  not 
be  in  Nin(D),  and  must  be  able  to  reach  at  least  two  different  2-to-l  nodes. 

1.  For  any  edge  e  such  that  there  exists  an  e'  G  £out(fail(e))  with  0(e')  =  1,  set 
0(e)  =  0(e').  Observe  now  that  any  path  eventually  reaches  a  labeled  edge. 
Furthermore,  the  tail  of  any  unlabeled  edge  cannot  be  a  node  contained  in 
r„  for  any  v,  so  it  can  lead  to  at  least  two  2-to-l  nodes. 

2.  Repeat  the  following  until  every  edge  other  than  those  connected  directly  to 
the  destination  is  labeled.  Consider  two  cases: 

•  There  is  no  2-to-2  node  with  exactly  one  labeled  output  edge:  Pick  an 
unlabeled  node  i.  Select  any  path  of  unlabeled  edges  out  of  i  until 
reaching  a  labeled  node.  Let  v  be  the  label  of  a  labeled  output  edge 
from  this  node.  For  all  edges  e  on  the  selected  path,  set  0(e)  =  v. 
Observe  that  every  node  on  this  path  was  previously  an  unlabeled  2-to- 
2  node.  Hence  every  node  on  this  path,  except  the  last  one,  has  exactly 
one  labeled  output  edge. 
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•  There  is  a  2-to-2  node  i  with  exactly  one  labeled  output  edge:  Let  v\  be 
the  label  on  the  labeled  output  edge.  Select  any  path  of  unlabeled  edges 
beginning  with  the  unlabeled  output  edge  from  i  until  reaching  a  node 
with  an  output  edge  labeled  v2  with  v2  ^  V\.  This  is  always  possible 
because  any  unlabeled  edge  must  be  able  to  lead  to  at  least  two  2-to-l 
nodes,  including  one  other  than  v\.  For  all  edges  e  on  the  selected  path, 
set  0(e)  =  v2.  Observe  that  before  we  labeled  the  path,  no  node  in  the 
path  other  than  the  last  one  had  an  output  edge  labeled  v2,  because  if 
it  did,  we  would  have  stopped  there.  Hence,  after  we  label  the  path,  if 
a  node  now  has  2  labeled  output  edges,  they  have  different  labels. 

Note  that  in  the  above  algorithm,  whenever  an  edge  e  becomes  labeled,  if  there  was 
another  edge  e'  with  head(e)  =  head(e/),  either  e'  was  unlabeled,  or  0(e)  ^  0(e'). 
Therefore,  the  final  0  values  satisfy  property  (B). 

2.10.2  Proof  of  Lemma  4 

Observe  that  for  any  2-to-2  node,  the  two  p  values  on  the  input  edges  are  identical 
to  the  two  p  values  on  the  output  edges.  For  a  2-to-l  node,  the  p  value  on  the 
output  edge  is  equal  to  the  p  value  on  one  of  the  input  edges.  Therefore  beginning 
with  any  edge  in  £out(5'),  we  may  follow  a  path  along  only  edges  with  the  same  p 
value,  and  clearly  we  will  hit  all  such  edges.  Property  (1)  immediately  follows. 

Property  (2)  follows  from  the  fact  that  2-to-2  nodes  always  operate  such  that 
from  the  symbols  on  the  two  output  edges,  it  is  possible  to  decode  the  symbols  on 
the  input  edges.  Therefore  the  destination  can  always  reverse  these  transformations 
to  recover  any  earlier  symbols  sent  in  the  network.  The  only  exception  is  2-to-l 
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nodes,  which  drop  one  of  their  two  input  symbols.  The  dropped  symbol  is  a  non¬ 
destination  symbol,  so  it  is  clear  that  the  destination  can  always  decode  the  rest. 

We  now  prove  property  (3).  We  claim  that  when  the  comparison  fails  at  node 
k,  it  is  impossible  for  the  destination  to  decode  Xf2.  We  may  assume  that  the 
destination  has  direct  access  to  all  symbols  on  edges  immediately  subsequent  to 
edges  in  A(v).  This  can  only  make  Xf2  easier  to  decode.  Recall  that  p(/i)  <  p(f2% 
so  Xj1  is  forwarded  directly  on  the  output  edge  of  k  not  in  A(v).  Therefore  the 
destination  can  only  decode  Xf2  if  it  can  decode  the  symbol  on  the  output  edge 
of  k  in  A(v).  Continuing  to  follow  the  path  through  A(u),  suppose  we  reach  an 
edge  e\  with  tail(ei)  =  k' ,  where  k'  is  a  branch  node.  Let  e 2  be  the  other  input 
edge  of  k' .  Even  if  p(ei)  <  p(e 2),  meaning  k'  would  normally  forward  Xei  outside 
of  A(v),  because  e\  carries  a  failed  comparison  bit,  k'  will  instead  forward  Xe2 
outside  of  A(v).  Again,  the  destination  can  only  decode  Xf2  (or  equivalently  Xei ) 
if  it  can  decode  the  symbol  on  the  output  edge  of  k!  in  A(v).  If  we  reach  a  node 
interacting  with  the  non-destination  symbol  associated  with  v,  then  because  of  the 
failed  comparison  bit,  the  formerly  non-destination  symbol  is  forwarded  outside  of 
A(u)  and  the  symbol  to  decode  continues  traveling  through  A(v).  ft  will  finally 
reach  v,  at  which  point  it  is  dropped.  Therefore  it  is  never  forwarded  out  of  A(v), 
so  the  destination  cannot  recover  it. 


2.10.3  Proof  of  Lemma  5 


From  Corollary  3,  it  is  enough  to  prove  the  existence  of  a  K  matrix  satisfying 


\Ke*Vl,e*V2,z\  \Kfl,h,z\  \Ke*vfltZ\  \Ke*2j2,z\  <  0. 


(2.108) 
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We  construct  a  Vandermonde  matrix  K  to  satisfy  (2.108)  for  all  iq,  v2  and  all  /i,  /2 
in  the  following  way.  We  will  construct  a  bijective  function  (an  ordering)  a  given 
by 

V2il}  U  nm{D)  — »  {1, . . . ,  M  +  |V2)1|}.  (2.109) 

For  each  v  G  V2)i,  set  a(e*)  to  an  arbitrary  but  unique  number  in  1, ... ,  | V2,i | -  We 
may  now  refer  to  a  2-to-l  node  as  a~1(a)  for  an  integer  a  G  {1, . . . ,  |  V2,i | } -  Now 
set  a(e)  for  e  G  £in (D)  such  that,  in  a  order,  the  edge  set  {e*  :  v  G  V2ii}  UNin(F>) 
is  written 


^ a  1  ( 1) ’  1(2)>  '  '  '  )  ^ a  1(|V2,i|)’ 


«“1(|V2,1|)),S(a-1(|V2, 


1)) . S(a-'(1)).  (2.110) 


That  is,  each  H(n)  set  is  consecutive  in  the  ordering,  but  in  the  opposite  order  as 
the  associated  non-destination  edges  e*.  Now  let  K  be  the  Vandermone  matrix 
with  constants  given  by  a.  That  is,  the  row  associated  with  edge  e  is  given  by 


1  a{e)  a(e)2  ■  ■  ■  oi(e)M  3 


(2.111) 


We  claim  the  matrix  K  given  by  (2.111)  satisfies  (2.108).  Fix  iq,v2,  and  fi  G 
S(vi),/2  G  S(u2).  Due  to  the  Vandermonde  structure  of  K,  we  can  write  the 
determinant  of  a  square  submatrix  in  terms  of  the  constants  a(e).  For  instance, 

\Ke%i,e%2,z\  =  He*V2)  -  a{e*vi)}  JJ[a(e)  -  a(e*1)][Q!(e)  -  a{e*V2)} 

eEZ 

]J  [a(e')-a(e)]  (2.112) 

e,e/GZ,a(e)<o:(e/) 

where  we  have  assumed  without  loss  of  generality  that  the  rows  of  Kz  are  ordered 
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according  to  a.  Expanding  the  determinants  in  (2.108)  as  such  gives 

\I<e*vl,e*V2.z\  \Kfl,f2,z\  \Keivfuz\  |-^ej2,/2,z|  (2.113) 

=  [««)  -  a«)][«(/2)  -  «(/i)][«(/i)  -  ««)][a(/2)  -  <*«)] 

•  JJ[a(e)  -  a(e*l)]2[a(e)  -  a(e*2)]2[a(e)  -  a(/i)]2[a(e)  -  a(/i)]2 

egZ 

n  [a(e')  -  «(e)]4.  (2.114) 

e,e/GZ,a(e)<a(e/) 

Recall  fi  E  Z(Vl),  /2  G  S(v2).  Since  we  chose  a  such  that  the  5  sets  are  in  opposite 
order  to  the  edges  e*,  we  have 

K4)-«(<)]W/2)-«(/i)1  <  o.  (2.115) 

Moreover,  since  all  the  H  sets  have  larger  a  values  than  the  edges  e*, 


«(/i)  -  a«)  >  0, 

(2.116) 

<*{h)  -  «(e*2)  >  0. 

(2.117) 

Hence,  there  is  exactly  one  negative  term  in  (2.114),  from  which  we  may  conclude 
(2.108). 

2.10.4  Proof  of  Lemma  6 

The  random  vector  W  is  distributed  according  to  the  type  of  the  message  vector 
as  it  is  produced  as  the  source.  We  formally  introduce  the  random  vector  W 
representing  the  message  as  it  is  transformed  in  the  network.  As  in  our  examples, 
this  vector  is  distributed  according  to  the  joint  type  of  the  sequences  as  they 
appear  in  the  network,  after  being  corrupted  by  the  adversary.  For  each  edge  e,  we 
define  Xe  and  Xe  similarly  as  random  variables  jointly  distributed  with  W  and  W 
respectively  with  distributions  given  by  the  expected  and  corrupted  joint  types. 


92 


109 


For  every  pair  of  nodes  we  need  to  prove  both  of  the  following: 

If  i  is  the  traitor,  and  j  e  £,  i  cannot  corrupt  values  in  KoA_{i,j} W.  (2.118) 

If  j  is  the  traitor,  and  i  G  L,  j  cannot  corrupt  values  in  iW>±{ij}W.  (2.119) 

In  fact,  each  of  these  implies  the  other,  so  it  will  be  enough  to  prove  just  one. 
Suppose  (2.118)  holds.  Therefore,  if  the  distribution  observed  by  the  destination 
of  KDj_ihj\W  does  not  match  that  of  K^, LpjjW,  then  at  least  one  of  i,j  will  not 
be  in  L.  If  they  both  were  in  £,  it  would  have  had  to  be  possible  for  node  i  to 
be  the  traitor,  make  it  appear  as  if  node  j  were  the  traitor,  but  also  corrupt  part 
of  KD±{i  j}W .  By  (2.118),  this  is  impossible.  Hence,  if  j  is  the  traitor  and  i  E  £, 
then  the  distribution  of  the  Kn±{i,j}Yn  must  remain  uncorrupted.  This  vector 
includes  Kd±jW,  a  vector  that  can  certainly  not  be  corrupted  by  node  j.  Since 
rank(/bo±j)  >  M  —  2,  and  there  are  only  M  — 2  degrees  of  freedom,  the  only  choice 
node  j  has  to  ensure  that  the  distribution  of  Kr>±{i,j}W  matches  p  is  to  leave  this 
entire  vector  uncorrupted.  That  is,  (2.119)  holds. 

Fix  a  pair  We  proceed  to  prove  either  (2.118)  or  (2.119).  Doing  so  will 

require  placing  constraints  on  the  actions  of  the  traitor  imposed  by  comparisons 
that  occur  inside  the  network,  then  applying  one  of  the  corollaries  of  Theorem  5  in 
Sec.  2.9.  Let  K±t  be  a  basis  for  the  space  orthogonal  to  Kt .  If  node  i  is  the  traitor, 
we  have  that  Kj_,W  ~  if^W.  Moreover,  since  j  e  £,  Kp±j W)  ~  FdijW.  These 
two  constraints  are  analogous  to  (2.63)  and  (2.62)  respectively,  where  the  symbols 
on  the  output  of  node  i  are  analogous  to  XlyX2.  The  subspace  of  KD  orthogonal  to 
both  K,  and  Kj  corresponds  to  Z  in  the  example.  We  now  seek  pairwise  constraints 
of  the  form  (2.64)-(2.66)  from  successful  comparisons  to  apply  Theorem  5. 

Being  able  to  apply  Theorem  5  requires  that  Kd±j  has  rank  M  —  2  for  all  j. 
Ensuring  this  has  to  do  with  the  choices  for  the  coefficients  7^1,74,2  used  in  (2.92). 
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A  rank  deficiency  in  K^±j  is  a  singular  event,  so  it  is  not  hard  to  see  that  random 
choices  for  the  7  will  cause  this  to  occur  with  small  probability.  Therefore  such  7 
exist. 

We  now  discuss  how  pairwise  constraints  on  the  output  symbols  of  i  or  j  are 
found.  Consider  the  following  cases  and  subcases: 

•  i,jE  U  W2:  Suppose  node  i  is  the  traitor.  Let  ex,  e2  be  the  output  edges 
of  node  i.  For  each  l  —  1,2,  we  look  for  constraints  on  Xei  by  following  the 
p  =  p(ei)  path  until  one  of  the  following  occurs: 

—  We  reach  an  edge  on  the  p  =  p{e{)  path  carrying  a  symbol  influenced 
by  node  j :  This  can  only  occur  immediately  after  a  branch  node  k 
with  input  edges  /x,/2  where  p(f1)  =  p(et),  p(f2)  <  p(/x),  and  Xh  is 
influenced  by  node  j.  At  node  k,  a  comparison  occurs  between  Xf1 , 
which  is  influenced  by  node  i  but  not  j,  and  Xf2.  If  the  comparison 
succeeds,  then  this  places  a  constraint  on  the  distribution  of  (X7  ,Xf2). 
If  the  comparison  fails,  the  forwarding  pattern  changes  such  that  the 
p  =  p(ei)  path  becomes  a  non-destination  path;  i.e.  the  value  placed  on 
ei  does  not  affect  any  variables  available  at  the  destination.  Hence,  the 
subspace  available  at  the  destination  that  is  corruptible  by  node  i  is  of 
dimension  at  most  one. 

—  We  reach  node  j  itself:  In  this  situation,  we  make  use  of  the  fact  that 
we  only  need  to  prove  that  node  i  cannot  corrupt  values  available  at  the 
destination  that  cannot  also  be  influenced  by  node  j.  Consider  whether 
the  p  =  p(ei)  path,  between  i  and  j,  contains  a  branch  node  k  with 
input  edges  /x,/2  such  that  p(fi)  =  p(e/)  and  p(/2)  >  p(/x).  If  there 
is  no  such  node,  then  Xei  cannot  influence  any  symbols  seen  by  the 
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destination  that  are  not  also  being  influenced  by  j.  That  is,  Xei  is  in 
span(/Vj_i.D  H  Kj^.D),  so  we  do  not  have  anything  to  prove.  If  there  is 
such  a  branch  node  k ,  then  the  output  edge  e  of  k  with  p(e)  =  p(f-2) 
contains  a  symbol  influenced  by  i  and  not  j.  We  may  now  follow  the 
p  =  p(e)  path  from  here  to  find  a  constraint  on  Xer  If  a  comparison 
fails  further  along  causing  the  forwarding  pattern  to  change  such  that 
the  p  =  p(e)  path  does  not  reach  the  destination,  then  the  potential 
influence  of  Xei  on  a  symbol  seen  by  the  destination  not  influenced  by 
node  j  is  removed,  so  again  we  do  not  have  anything  to  prove. 

—  The  p  =  p(ei )  path  leaves  the  network  without  either  of  the  above  oc¬ 
curring:  Immediately  before  leaving  the  network,  the  symbol  will  be 
compared  with  a  non-destination  symbol.  This  comparison  must  suc¬ 
ceed,  because  j  cannot  influence  the  non-destination  symbol.  This  gives 
a  constraint  Xe[. 

We  may  classify  the  fates  of  the  two  symbols  out  of  i  as  discussed  above  as 
follows: 

1.  Either  the  forwarding  pattern  changes  such  that  the  symbol  does  not 
reach  the  destination,  or  the  symbol  is  in  span (K^r,  D  Kj^p),  and  so 
we  do  not  need  to  prove  that  it  cannot  be  corrupted.  Either  way,  we 
may  ignore  this  symbol. 

2.  The  symbol  leaves  the  network,  immediately  after  a  successfully  com¬ 
parison  with  a  non-destination  symbol. 

3.  The  symbol  is  successfully  compared  with  a  symbol  influenced  by  node 
j.  In  particular,  this  symbol  from  node  j  has  a  strictly  smaller  p  value 
than  p(ei). 
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We  divide  the  situation  based  on  which  of  the  above  cases  occur  for  /  =  1,  2 
as  follows: 

—  Case  1  occurs  for  both  l  —  1,2:  We  have  nothing  to  prove. 

—  Case  1  occurs  for  (without  loss  of  generality)  1  =  1:  Either  case  2 
or  3  gives  a  successful  comparison  involving  a  symbol  influenced  by 
Xe-.  Applying  Corollary  1  allows  us  to  conclude  that  Xe/  cannot  be 
corrupted. 

—  Case  2  occurs  for  both  l  =  1,2:  If  the  two  paths  reach  different  non¬ 
destination  symbols,  then  we  may  apply  Lemma  5  to  conclude  that  node 
i  cannot  corrupt  either  Xei  nor  Xe2.  Suppose,  on  the  other  hand,  that 
each  path  reaches  the  same  non-destination  path,  in  particular  the  one 
associated  with  2-to-l  node  v.  Since  0(ei)  ^  0(e2),  assume  without  loss 
of  generality  that  0(e i)  ^  v.  We  may  follow  the  path  starting  from  e\ 
through  r(u)  to  End  an  additional  constraint,  after  which  we  may  apply 
Corollary  2.  All  symbols  on  this  path  are  influenced  by  Xei.  This  path 
eventually  crosses  the  non-destination  path  associated  with  v.  If  the 
symbol  compared  against  the  non-destination  symbol  at  this  point  is 
not  influenced  by  j,  then  the  comparison  succeeds,  giving  an  additional 
constraint.  Otherwise,  there  are  two  possibilities: 

*  The  path  through  T(u)  reaches  j:  There  must  be  a  branch  node 
on  the  path  to  T(u)  before  reaching  j  such  that  the  path  from  e\ 
has  the  smaller  p  value.  If  there  were  not,  then  case  1  would  have 
occurred.  Consider  the  most  recent  such  branch  node  k  in  T(u) 
before  reaching  j.  Let  /i,/2  be  the  input  edges  to  k,  where  fi  is 
on  the  path  from  e\.  We  know  p(fi)  <  p(/2).  The  comparison  at 
k  must  succeed.  Moreover,  this  successful  comparison  comprises 
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a  substantial  constraint,  because  the  only  way  the  destination  can 
decode  X f2  is  through  symbols  influenced  by  node  j. 

*  The  path  through  T(u)  does  not  reach  j :  Let  k  be  the  first  common 
node  on  the  paths  from  i  and  j  through  T(v).  Let  /i,  /2  be  the  input 
edges  of  k,  where  f\  is  on  the  path  from  i  and  /2  is  on  the  path  from 
j.  If  the  comparison  at  k  succeeds,  this  provides  a  constraint.  If  it 
fails,  then  the  forwarding  pattern  changes  such  that  the  p  =  p(/i) 
path  becomes  a  non-destination  path.  Since  we  are  not  in  case  1, 
p(e i)  ytz  p(fi),  but  a  symbol  influenced  by  Xei  is  compared  against 
a  symbol  on  the  p  =  p(/i)  path  at  a  branch  node  in  r(u).  This 
comparison  must  succeed,  providing  an  additional  constraint. 

—  Case  3  occurs  for  (without  loss  of  generality)  l  =  1,  and  either  case  2  or 
3  occurs  for  l  =  2:  We  now  suppose  instead  that  node  j  is  the  traitor. 
That  is,  we  will  prove  (2.119)  instead  of  (2.118).  Recall  that  a  successful 
comparison  occurs  at  a  branch  node  k  with  input  edges  /i,  f2  where  Xfx 
is  influenced  by  Xei,  Xf2  is  influenced  by  node  j,  and  p(/2)  <  p(/i).  Let 
e'^e'2  be  the  output  edges  of  node  j,  and  suppose  that  p(e,1)  =  p(/2); 
i.e.  the  symbol  Xf2  is  influenced  by  X(j  .  The  success  of  the  comparison 
gives  a  constraint  on  Xei .  Since  p(/2)  <  p(/i),  we  may  continue  to 
follow  the  p  =  p(y2)  path  from  node  k,  and  it  continues  to  be  not 
influenced  by  node  i.  As  above,  we  may  find  an  additional  constraint 
on  Xe/  by  following  this  p  path  until  reaching  a  non-destination  symbol 
or  reaching  another  significant  branch  node.  Furthermore,  we  may  find 
a  constraint  on  Xf,^  in  a  similar  fashion.  This  gives  three  constraints  on 
Wy ,  Af/  ,  enough  to  apply  Corollary  2,  and  conclude  that  node  j  cannot 
corrupt  its  output  symbols. 
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•  i  G  W3U  V2,i  \3\fin(D),  j  G  Wi  U  W2:  Assume  node  i  is  the  traitor.  If  i  G  V23 
with  single  output  edge  e  such  that  -0(e)  =  1,  then  node  f  controls  no  sym¬ 
bols  received  at  the  destination  and  we  have  nothing  to  prove.  Otherwise, 
it  controls  just  one  symbol  received  at  the  destination,  so  any  single  con¬ 
straint  on  node  i  is  enough.  Let  e'  be  the  output  symbol  of  i  with  0(f)  =  0. 
Since  we  assume  i  0  l\fin(.D),  the  p  =  p(e')  path  is  guaranteed  to  cross  a 
non-destination  path  after  node  i.  As  above,  follow  the  p  =  p(e' )  path  until 
reaching  a  branch  node  k  at  which  the  symbol  is  combined  with  one  influ¬ 
enced  by  node  j.  If  the  comparison  at  node  k  succeeds,  it  gives  a  constraint 
on  Xei.  If  the  comparison  fails,  then  the  forwarding  pattern  will  change  such 
that  the  p  =  p(e')  path  will  fail  to  reach  the  destination,  so  we’re  done. 

•  i  G  Wi  UW2,j  G  3\fin(.D):  Assume  node  i  is  the  traitor.  By  construction, 
since  one  output  edge  of  j  goes  directly  into  the  destination,  the  other  must 
be  on  a  non-destination  path.  Hence,  j  only  controls  one  symbol  at  the 
destination,  so  we  again  need  to  place  only  one  constraint  on  node  i.  Let 
e  G  £out(0  be  such  that  0(e)  ^  0(e')  for  all  e'  G  £out(j)-  This  is  always 
possible,  since  the  two  output  edges  of  i  have  different  0  values,  and  since 
one  output  edge  of  j  goes  directly  to  the  destination,  only  one  of  the  output 
edges  of  j  has  a  0  value.  Let  v  =  0(e).  Follow  the  path  from  e  through  A(u) 
until  reaching  the  non-destination  symbol  at  node  k  with  input  edges  /i,  /2. 
Assume  Xf1  is  influenced  by  Xe  and  Xf2  is  a  non-destination  symbol.  The 
comparison  between  these  two  symbols  must  succeed,  because  node  j  cannot 
influence  either  Xfx  or  Xf2.  This  places  the  necessary  constraint  on  Xe. 

•  i,j  G  W3  U  V21:  Nodes  i,j  each  control  at  most  one  symbol  available  at  the 


destination,  so  either  one,  in  order  to  make  it  appear  as  if  the  other  could  be 
the  traitor,  cannot  corrupt  anything. 
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2.10.5  Proof  of  Theorem  4  when  the  Cut-set  Bound  is  M— 3 

We  now  briefly  sketch  the  proof  of  Theorem  4  for  the  case  that  the  cut-set  bound 
is  M  —  3.  The  proof  is  far  less  complicated  than  the  above  proof  for  the  M  —  2  case, 
but  it  makes  use  of  many  of  the  same  ingredients.  First  note  that  the  set  of  2-to-2 
nodes  i  that  cannot  reach  any  2-to-l  nodes  must  form  a  path.  We  next  perform 
a  similar  edge  labeling  as  above,  defining  cj)  and  ^  as  in  (2.86)-(2.87).  Properties 
(A)  and  (B)  must  still  hold,  except  that  edges  may  have  null  labels,  and  property 
(C)  is  replaced  by 

C’  For  every  2-to-2  node  that  can  reach  at  least  one  2-to-l  node,  at  least  one  of 
its  output  edges  must  have  a  non-null  label. 

Internal  nodes  operate  in  the  same  way  based  on  the  edge  labels  as  above,  where 
symbols  are  always  forwarded  along  edges  with  null  labels.  The  decoding  process 
is  the  same.  Proving  an  analogous  version  of  Lemma  6  requires  only  finding  a 
single  constraint  on  one  of  i  or  j.  This  is  always  possible  since  one  is  guaranteed 
to  have  a  label  on  an  output  edge,  unless  they  are  both  in  the  single  path  with  no 
reachable  2-to-l  nodes,  in  which  case  they  influence  the  same  symbol  reaching  the 
destination. 

Interestingly,  this  proof  does  not  make  use  of  the  planarity  of  the  graph.  We 
may  therefore  conclude  that  for  networks  satisfying  properties  (2)  and  (3)  in  the 
statement  of  Theorem  4,  the  cut-set  bound  is  always  achievable  if  the  cut-set  is 
strictly  less  than  M  —  2. 
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1  5 


Figure  2.12:  The  Calamari  Network,  having  capacity  strictly  less  than  the  cut¬ 
set  bound.  All  edges  have  unit-capacity.  There  is  at  most  one  traitor,  but  it  is 
restricted  to  be  one  of  the  black  nodes. 

2.11  Looseness  of  the  Cut-set  Bound 


So  far,  the  only  available  upper  bound  on  achievable  rates  has  been  the  cut-set 
bound.  We  have  conjectured  that  for  planar  graphs  this  bound  is  tight,  but  that 
still  leaves  open  the  question  of  whether  there  is  a  tighter  upper  bound  for  non- 
planar  graphs.  It  was  conjectured  in  [37]  that  there  is  such  a  tighter  bound,  and 
here  we  prove  this  conjecture  to  be  true.  We  have  already  shown  in  Sec.  2.3  that  the 
limited-node  and  all-node  problems  are  equivalent.  Fig.  2.12  shows  the  Calamari1 
Network,  a  limited-node  problem  for  which  there  is  an  active  upper  bound  on 
capacity  other  than  the  cut-set.  It  is  easy  to  see  that  in  the  transformation  from 
limited-node  to  all-node  used  to  prove  their  equivalence  in  Sec.  2.3  does  not  change 
the  cut-set  bound.  Therefore,  the  looseness  of  the  cut-set  bound  for  the  Calamari 
Network  implies  that  even  for  the  all-node  problem,  the  cut-set  bound  is  not  tight 
in  general.  Furthermore,  it  is  not  hard  to  transform  the  Calamari  Network  into  an 
unequal-edge  problem;  this  therefore  confirms  the  conjecture  in  [37]. 

1  Calamari  is  the  cockroach  of  the  sea.  I  think. 
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In  the  Calamari  Network,  there  may  be  at  most  one  traitor,  but  it  is  restricted 
to  be  one  of  the  black  nodes.  The  cut-set  bound  is  2,  but  in  fact  the  capacity  is 
no  more  than  1.5. 


Consider  a  code  achieving  rate  R.  For  i  =  1,2,  3, 4,  let  X,  be  the  random 
variable  representing  the  value  on  the  output  edge  of  node  i.  Let  Y  be  the  value 
on  edge  (9,  D )  and  let  Z  be  the  value  on  (10,  D ).  Let  p  be  the  honest  distribution 
on  these  variables,  and  define  the  following  alternative  distributions: 


<?3  =  p(x1x2x4)p(x3)p(y\x1x2x3)p(z\x3x4),  (2.120) 

<?4  =  p(x1x2x3)p(x4)p(y\x1x2x3)p(z\x3x4).  (2.121) 


We  may  write 


R  <  Iq3{X4X2X4-  YZ) 


(2.122) 


because,  if  node  3  is  the  traitor,  it  may  generate  a  completely  independent  version 
of  X3  and  send  it  along  edge  (3,7),  resulting  in  the  distribution  q3.  In  that  case, 
assuming  the  destination  can  decode  properly,  information  about  the  message  must 
get  through  from  the  honest  edges  at  the  start  of  the  network,  Ad,  X2,  X4,  to  what 
is  received  at  the  destination,  Y,  Z.  From  (2.122),  we  may  write 


R  <  Iq3( X4X2X4;  Z)  +  Iq3( X4X2X4-  Y\Z)  (2.123) 

<  Ig3(X 4;  Z)  +  /(Ad Ad;  Z\X4)  +  1  (2.124) 

=  Iq3(X4,Z)  +  l  (2.125) 

where  in  (2.124)  we  have  used  that  the  capacity  of  (9,  D)  is  1,  and  in  (2.125)  that 
Aj  A2  —  Ad  —  Z  is  a  Markov  chain  according  to  q3.  Using  a  similar  argument  in 
which  node  4  is  the  traitor  and  it  acts  in  a  way  to  produce  q4,  we  may  write 


R  Y  Iq4(X 3;  Z)  +  1. 


(2.126) 
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Note  that 


q3(x3x4z)  =  q4(x3x4z). 


(2.127) 


In  particular,  the  mutual  informations  in  (2.125)  and  (2.126)  can  both  be  written 
with  respect  to  the  same  distribution.  Therefore, 


2 R  <  Iq3(X4 ;  Z)  +  Iq3(X3; Z)  +  2  (2.128) 

=  Iqs(X3X 4;  Z)  +  4(X3;  X4)  -  4,(X3;  X4|Z)  +  2  (2.129) 

—  Iqz{X3X4]  Z)  +  2  (2.130) 

<  3  (2.131) 


where  (2.130)  follows  from  the  positivity  of  conditional  mutual  information  and 
that  X3,X4  are  independent  according  to  q3,  and  (2.131)  follows  because  the  ca¬ 
pacity  of  (10,  D)  is  1.  Therefore,  R  <  1.5. 


Observe  that  all  inequalities  used  in  this  upper  bound  were  so-called  Shannon- 
type  inequalities.  For  the  non-Byzantine  problem,  there  is  a  straightforward  pro¬ 
cedure  to  write  down  all  the  Shannon-type  inequalities  relevant  to  a  particular 
network  coding  problem,  which  in  principle  can  be  used  to  find  an  upper  bound. 
This  upper  bound  is  more  general  than  any  cut-set  upper  bound,  and  in  some 
multi-source  problems  it  has  been  shown  to  be  tighter  than  any  cut-set  bound. 
This  example  illustrates  that  a  similar  phenomenon  occurs  in  the  Byzantine  prob¬ 
lem  even  for  a  single  source  and  single  destination.  As  the  Byzantine  problem  seems 
to  have  much  in  common  with  the  multi-source  non-Byzantine  problem,  it  would 
be  worthwhile  to  formulate  the  tightest  possible  upper  bound  using  only  Shannon- 
type  inequalities.  However,  it  is  yet  unclear  what  the  “complete”  list  of  Shannon 
type  inequalities  would  be  for  the  Byzantine  problem.  This  example  certainly 
demonstrates  one  method  of  finding  them,  but  whether  there  are  fundamentally 
different  methods  to  find  inequalities  that  could  still  be  called  Shannon-type,  or 
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Figure  2.13:  The  Beetle  Network.  All  edges  have  unit-capacity  except  the  dashed 
edge,  which  has  zero  capacity. 

even  how  to  compile  all  inequalities  using  this  method,  is  unclear.  Moreover,  it  has 
been  shown  in  the  non-Byzantine  problem  that  there  can  be  active  non-Shannon- 
type  inequalities.  It  is  therefore  conceivable  that  non-Shannon-type  inequalities 
could  be  active  even  for  a  single  source  under  Byzantine  attack. 


2.12  More  on  Cut-Set  Bounds 

We  first  give  an  example  network  illustrating  the  necessity  of  requiring  no  back¬ 
wards  edges  in  Theorem  3.  This  example — the  Beetle  network,  shown  in  Fig.  2.13 — 
is  also  interesting  in  that  it  has  a  zero-capacity  edge  which  strictly  increases  capac¬ 
ity.  We  then  proceed  to  state  and  prove  a  cut-set  bound  tighter  than  Theorem  3, 
which  allows  cuts  with  backwards  edges  but  has  a  more  elaborate  method  of  de¬ 
termining  the  upper  bound  given  a  cut.  For  other  cut-set  bounds  on  adversarial 
problems,  see  [37,  38]. 

2.12.1  The  Beetle  Network 

The  Beetle  Network,  shown  in  Figure  2.13,  under  the  presence  of  a  single  traitor 
node,  has  two  interesting  properties.  First,  there  is  a  cut  with  a  backwards  edge  for 
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which  the  value  of  the  right  hand  side  of  (2.4)  is  strictly  less  than  capacity.  This 
illustrates  the  need  for  the  condition  in  Theorem  3  that  cuts  have  no  backwards 
edges.  Second,  it  has  a  zero  capacity  edge,  the  presence  of  which  has  a  positive 
effect  on  the  capacity.  That  is,  the  capacity  of  this  network,  as  we  will  demonstrate, 
is  1,  but  if  the  zero-capacity  edge  (4,  D)  were  removed,  the  capacity  would  be  0,  as 
can  easily  be  verified  by  Theorem  3.  The  reason  for  this  is  that,  as  we  have  seen, 
comparison  operations  can  increase  capacity,  so  we  can  use  the  zero-capacity  edge 
to  hold  a  comparison  bit. 

We  may  apply  Theorem  3  with  A  =  {S,  1,2,  3, 4}  and  T  =  {1,2}  to  conclude 
that  the  capacity  is  no  more  than  1.  We  will  shortly  present  a  code  to  achieve  rate 
1.  Now  consider  the  cut  A  =  {S',  1,2,4}.  For  this  cut  (3,4)  is  a  backwards  edge, 
so  we  cannot  apply  Theorem  3.  Note  that  if  we  set  T  =  {1,2},  the  right  hand  side 
of  (2.4)  would  evaluate  to  0,  strictly  less  than  capacity. 

We  now  present  a  simple  linear  code  with  a  comparison  for  the  Beetle  Network 
achieving  rate  1.  Each  unit-capacity  edge  carries  a  copy  of  the  message  w.  That 
is,  the  source  sends  w  along  all  three  of  its  output  links,  and  nodes  1,  2,  and  3 
each  receive  one  copy  of  w  and  forward  it  along  all  of  their  output  links.  Node  4 
receives  a  copy  of  w  from  the  source  and  and  one  from  node  3.  It  compares  them 
and  sends  to  the  destination  one  of  the  symbols  =  or  ^  depending  on  whether  the 
two  copies  agreed.  Because  w  may  be  a  vector  of  arbitrary  length,  sending  this 
single  bit  along  edge  (4,  D )  takes  zero  rate,  so  we  do  not  exceed  the  edge  capacity. 

The  decoding  procedure  is  as  follows.  Let  Wi,  w2,  and  w3  be  the  values  of  w 
received  at  the  destination  from  nodes  1,  2,  and  3  respectively.  If  either  w2  ^  W3 
or  the  destination  receives  7^  from  node  4,  then  certainly  the  traitor  must  by  one 
of  nodes  2,  3,  or  4,  so  w  1  is  trustworthy  and  the  destination  decodes  from  it.  Now 
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consider  the  case  that  w2  =  w%  and  the  destination  receives  =  from  node  4.  The 
destination  decodes  from  rr2  or  tc3.  Certainly  if  the  traitor  is  either  node  1  or 
4,  then  W2  =  w 3  =  w.  If  the  traitor  is  node  3,  then  tc2  =  w,  so  we  still  decode 
correctly.  If  the  traitor  is  node  2,  then  it  must  send  the  same  value  of  w  to  both  the 
destination  and  node  3,  because  node  3  simply  forwards  its  copy  to  the  destination, 
and  we  know  tc2  =  tc3.  Furthermore,  this  value  of  w  must  be  the  true  one,  because 
otherwise  node  4  would  observe  that  the  copy  sent  along  edge  (3,4)  is  different 
from  that  sent  from  the  source,  so  it  would  transmit  to  the  destination.  Since 
it  did  not,  node  2  cannot  have  altered  any  of  its  output  values.  Therefore  the 
destination  always  decodes  correctly. 

2.12.2  Tighter  Cut-Set  Upper  Bound 

The  following  theorem  is  a  tighter  cut-set  bound  than  Theorem  3,  as  it  allows  cuts 
with  backwards  edges. 

Theorem  6  Fix  a  cut  A  C  V  with  S  G  A  and  D  ^  A.  Also  fix  sets  of  nodes  T 
and  T*  with  T*  C  T  and 

\T\  +  IT* |  <  2s.  (2.132) 

Let  B  be  the  set  of  nodes  that  can  reach  a  node  in  A\T .  Then 

C  <\{{i,j)eE:ieA\T,  j£A}\  +  \{{i,j)  e  E  :  i  e  AnT\T*,  j  G  AcnB}\. 

(2.133) 


Proof:  Choose  a  coding  order  on  the  nodes  in  T  \  T*  written  as 

(ti,  ■  ■  ■  ,t\T\T*\)- 


(2.134) 
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That  is,  if  there  is  a  path  from  tu  to  tv,  then  u  <  v.  Let 

r1  =  ru{t1,...,f_|r.|}1  (2.135) 

T2  =  T*  U  {ts-|T*|,  •  •  -  >  (2.136) 

Note  that  | | ,  \T2\  <  s  and  Ti  U  T2  =  T.  For  l  —  1,2,  let  Ei  be  the  set  of  edges 
(i,j)  with  i  e  Ti\T*  and  j  G  Ac\  B.  Let  E*  be  the  set  of  edges  (i,j)  with  i  e  T* 
and  j  G  Ac.  Finally,  let  Ea  be  the  set  of  edges  crossing  the  cut;  that  is,  edges 
(i,j)  with  i  E  A  and  j  A.  Let  E  =  Ea  \Ei\E2\  E*.  Observe  that  (2.133)  can 
equivalently  be  written 

C  <\E\.  (2.137) 

Suppose  (2.133)  were  not  true.  Then  there  would  exist  a  code  achieving  a  rate 
R  such  that 

R>\E\.  (2.138) 

We  will  consider  two  possibilities,  one  when  T\  are  the  traitors  and  they  alter  the 
values  on  E\  U  E*,  and  one  when  T2  are  the  traitors  and  they  alter  the  values  on 
E2  U  E*.  Note  that  there  are  may  be  edges  out  of  the  set  of  traitors  whose  values 
are  not  altered;  on  these  edges  the  traitors  will  act  honestly,  performing  the  code 

as  it  is  designed.  We  will  show  that  by  (2.138),  it  is  possible  for  the  traitors  to  act 

in  such  a  way  in  these  two  cases  that  even  though  the  messages  at  the  source  are 
different,  all  values  sent  across  the  cut  are  the  same;  therefore  the  destination  will 
not  be  able  to  distinguish  all  messages. 

Let  xe*  be  one  possible  value  sent  on  the  edges  in  E*.  Both  possible  sets  of 
traitors  may  influence  the  values  on  E*,  and  in  both  cases  they  will  place  Xe*  on 
these  edges.  For  any  set  of  edges  F,  define  the  function 

XF  :  2nR  x  H^n  T  (2.139) 

e€E*  e&F 
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such  that  when  the  message  is  w,  and  all  nodes  act  honestly  except  for  T*  which 
place  xe*  on  E*,  the  values  on  edges  in  F  is  given  by  Xe(w,xe*)- 

Consider  an  edge  (i,j)  G  E.  We  claim  that  the  value  on  this  edge  depends 
only  on  the  message  and  Xe*\  it  does  not  depend  on  the  values  placed  on  Ex  or 
E'2  by  the  traitors.  If  i  is  a  traitor,  then  by  construction  i  acts  honestly  on  this 
edge.  Consider  any  path  from  the  source  passing  through  We  wish  to  show 

that  at  no  point  a  value  is  placed  on  an  edge  in  this  path  that  deviates  from  the 
honest  code,  except  at  edges  in  E*.  The  only  other  point  at  which  it  might  occur 
would  be  at  an  earlier  edge  However,  is  on  a  path  leading  to  i.  If 

i  G  A\T,  then  j'  G  B,  so  ^  Ex  U  E2l  so  the  value  on  this  edge  is  not 

changed  by  the  traitor.  If  i  G  T,  then  j  must  be  in  B,  meaning  j'  is  also  in  B,  so 
again  ^  Ex  U  E2.  Therefore  the  values  placed  on  E  is  exactly  Xe(w,xe*) 

no  matter  which  set  of  nodes  T\  or  T2  is  the  traitor.  By  (2.138),  there  exists  two 
messages  wx  and  w2  such  that 

Xe(wi,xe*)  =  Xe(w2,xe*).  (2.140) 

We  now  specify  the  two  cases  that  confuse  messages  wx  and  w2  at  the  destina¬ 
tion: 

1.  The  true  message  is  wx  and  the  traitors  are  Tx .  They  place  xe*  on  E*  and 
XEi(w2,xe*)  on  Ei.  Let  x'E2  be  the  value  placed  on  E2  in  this  case.  Recall 
that  the  values  on  E  are  given  by  (2.140). 

2.  The  true  message  is  w2  and  the  traitors  are  T2.  They  place  xe*  on  E*  and 
x'E2  on  E2.  Again,  the  values  on  E  are  given  by  (2.140).  Moreover,  because 
of  our  choice  of  T\  and  T2  in  terms  of  the  coding  order  in  (2.135)-(2.136), 
edges  in  E2  are  entirely  downstream  of  those  in  E\ ,  so  the  values  on  E\  are 
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Xe1{w2,  Xe*)- 


The  values  on  all  edges  crossing  the  cut  in  both  cases  are  the  same.  Therefore,  all 
values  received  by  the  destination  are  the  same,  so  it  must  make  an  error  on  one 
of  the  two  messages.  □ 

Note  that  if  A  has  no  backwards  edges,  then  B  C  A,  so  the  second  term  in 
(2.133)  would  be  0.  Hence  we  recover  Theorem  3. 

We  briefly  illustrate  an  application  of  Theorem  6  on  the  Beetle  Network  for  the 
cut  with  a  backwards  edge.  Let  A  =  {S', 1,  2, 4}  and  T  =  {1,2}.  The  set  B  consists 
of  {S,  1,  2,  3, 4},  so  the  second  term  in  (2.133)  counts  the  edge  (2,3).  Therefore 
(2.133)  gives  an  upper  bound  is  1.  This  is  a  correct  bound,  even  though  it  would 
not  be  had  the  second  term  in  (2.133)  not  been  included. 


2.13  Proof  of  Bound  on  Linear  Capacity  for  the  Cockroach 
Network 

We  show  that  no  linear  code  for  the  Cockroach  Network,  shown  in  Figure  2.1,  can 
achieve  a  rate  higher  than  4/3.  Fix  any  linear  code.  For  any  link  (i,j),  let  Xtj 
be  the  value  placed  on  this  link.  For  every  node  i,  let  Xt  be  the  set  of  messages 
on  all  links  out  of  node  i,  and  1/  be  the  set  of  messages  on  all  links  into  node  i. 
Let  Gxi^Vj  be  the  linear  transformation  from  Xt  to  Y),  assuming  all  nodes  behave 
honestly.  Observe  that 

Yd  =  Gxs-+ydXs{w)  +  ^  Gxi^YDe-i  (2.141) 


108 


125 


where  e,  represents  the  difference  between  what  a  traitor  places  on  its  outgoing 
links  and  what  it  would  have  placed  on  those  links  if  it  were  honest.  Only  one 
node  is  a  traitor,  so  at  most  one  of  the  et  is  nonzero.  Note  also  that  the  output 
values  of  the  source  Xg  is  a  function  of  the  message  w.  We  claim  that  for  any 
achievable  rate  R, 


R  < 


n 


rank(Gxs->yD)  -  max  rank ( GXi x:j ^ vD  ) 
hi 


(2.142) 


where  n  is  the  block  length  used  by  this  code.  To  show  this,  first  note  that  for  any 
pair  of  nodes  i,j  there  exist  K,  Hi ,  H2  such  that 


Gxs^yd  —  A  +  Gx,^ydH\  +  G  Xj^YDH2  (2.143) 

and  where 

rank  (A')  =  rank(Gxs->yD)  -  rank(G'xix,^yD)-  (2.144) 

That  is,  the  first  term  on  the  right  hand  side  of  (2.143)  represents  the  part  of  the 
transformation  from  Xg  to  Yg>  that  cannot  be  influenced  by  X*  or  Xj.  Consider 
the  case  that  rank(A')  <  R.  Then  there  must  be  two  messages  u>i,w2  such  that 
KXs(wi)  =  KXs(w2 )•  If  the  message  is  wi,  node  i  may  be  the  traitor  and  set 

e,  =  H1(Xs(w2)  -  Xs(Wl)).  (2.145) 


Alternatively,  if  the  message  is  w2,  node  j  may  be  the  traitor  and  set 


ej  =  H2{Xs(w1)-Xs{w2)). 

In  either  case,  the  value  received  at  the  destination  is 


(2.146) 


ho  —  KXg{w\)  +  Gx^Yd  H  i  Xg  (w2 )  +  G  Xj-^Yc  H2X  g(wi) . 

Therefore,  these  two  cases  are  indistinguishable  to  the  destination,  so  it  must  make 
an  error  for  at  least  one  of  them.  This  proves  (2.142). 
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Now  we  return  to  the  specific  case  of  the  Cockroach  Network.  Observe  that  the 
*4,  Y)  is  a  linear  combination  of  X14  and  AO, 4.  Let  k\  be  the  number  of  dimensions 
of  XA)b  that  depend  only  on  Afi,4  and  are  independent  of  A"2,4.  Let  k2  be  the 
number  of  dimensions  of  A"4,d  that  depend  only  on  X2,4,  and  let  be  the  number 
of  dimensions  that  depend  on  both  A"14  and  A2,4.  Certainly  k\  +  k2  +  k3  <  n. 
Similarly,  let  h,l2,h  be  the  number  of  dimensions  of  A"5,d  that  depend  only  on 
X25,  that  depend  only  on  A3, 5,  and  that  depend  on  both  respectively.  Finally,  let 
rri  1  and  m2  be  the  number  of  dimensions  of  A"4,d  and  X?Ju  respectively. 

We  may  write  the  following: 

rank(G'xs^y4)  -  rank(G'x2,x3^y4)  <  mi  +  h, 

Ttmk(Gxs^Y4)  ~  rank(G'x1,x3^y4)  <  h  +  h, 
rank(G'xs->y4)  -  rank(GYl,x2^y4)  <  h  +  m2. 

Therefore,  using  (2.142),  any  achievable  rate  R  is  bounded  by 

R  <  —  minjmi  +  ki,  k3  +  h,  l3  +  m2}  (2.147) 

n 

subject  to 


h  +  k2  +  fa  <  n, 

(2.148) 

l\  +  I2  A  ^3  5;  ni 

(2.149) 

mi  <  n, 

(2.150) 

m2  <  n. 

(2.151) 

It  is  not  hard  to  show  that  this  implies  R  <  4/3. 
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CHAPTER  3 

SLEPIAN-WOLF 


3.1  Introduction 

Fig.  3.1  shows  the  multiterminal  source  coding  problem  of  Slepian  and  Wolf  [40]. 
At  each  time  t  =  1, ...  ,n,  a  source  generates  an  independent  copy  of  the  corre¬ 
lated  random  variables  Yi, ... ,  YL  according  to  the  distribution  p(yi  ■  ■  ■ yL ).  Each 
sequence  Yf'  is  delivered  to  the  corresponding  node  i.  The  L  nodes  operate  inde¬ 
pendently  of  one  another.  Node  i  encodes  its  observation  at  rate  Ri  and  transmits 
the  encoded  version  to  a  common  decoder,  which  attempts  to  exactly  recover  all  the 
sources  with  high  probability.  Slepian  and  Wolf  characterized  in  [40]  the  complete 
achievable  rate  region  for  this  problem — that  is,  the  set  of  rate  vectors  (Ri, . . . ,  Rl) 
at  which  it  is  possible  for  the  decoder  to  recover  all  sources — and  they  found  that 
the  sum-rate  can  be  made  as  low  as  the  joint  entropy  of  all  sources: 

H(W---W).  (3.1) 

This  is  precisely  the  minimum  rate  that  could  be  achieved  if  all  the  sources  were 
observed  by  a  single  node,  as  was  originally  shown  by  Shannon  [49]  in  his  source 
coding  theorem.  The  surprising  result  of  Slepian- Wolf,  then,  is  that  no  additional 
sum-rate  is  required  when  the  nodes  are  separated  from  each  other. 

In  this  chapter,  we  consider  a  modification  to  this  classic  problem  in  which  an 
adversary  controls  an  unknown  subset  of  nodes,  and  may  transmit  arbitrary  data 
to  the  decoder  from  those  nodes.  It  is  obvious  that  observations  made  by  these 
traitors  are  irretrievable  unless  the  traitors  choose  to  deliver  them  to  the  decoder. 
Thus  the  best  the  decoder  can  hope  to  achieve  is  to  reconstruct  the  observations 
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Figure  3.1:  The  Slepian-Wolf  multiterminal  source  coding  problem.  The  sources 
Yjn , . . . ,  Y£  are  independent  and  identically  distributed  in  time,  and  correlated  in 
space  as  specified  by  the  joint  distribution  p(y i  •  •  • i/l ).  Each  source  sequence  Y J1 
is  observed  by  node  %  and  encoded  at  rate  Rj  to  a  common  decoder.  The  decoder 
produces  an  estimate  Ytn  for  each  source  sequence,  attempting  to  match  the  sources 
exactly  with  high  probability. 

of  the  honest  nodes.  A  simple  procedure  is  to  ignore  the  statistical  correlations 
among  the  observations  and  collect  data  from  each  node  individually.  The  total 
sum  rate  of  such  an  approach  is  'YhiH{ Yj ).  One  expects  however  that  this  sum 
rate  can  be  lowered  if  the  correlation  structure  is  not  ignored. 

Standard  coding  techniques  for  the  Slepian-Wolf  problem  have  no  mechanism 
for  handling  any  deviations  from  the  agreed-upon  encoding  functions  by  the  nodes. 
Even  a  random  fault  by  a  single  nodes  could  have  devastating  consequences  for 
the  accuracy  of  the  source  estimates  produced  at  the  decoder,  to  say  nothing  of 
a  Byzantine  attack  on  multiple  nodes.  In  particular,  because  Slepian-Wolf  coding 
takes  advantage  of  the  correlation  among  sources,  manipulating  the  codeword  for 
one  source  can  alter  the  accuracy  of  the  decoder’s  estimate  for  other  sources.  It 
will  turn  out  that  for  most  source  distributions,  the  sum  rate  given  in  (3.1)  cannot 


112 


129 


be  achieved  if  there  is  even  a  single  traitor.  Our  goal  is  to  characterize  the  lowes 
achievable  sum-rate  for  this  problem,  and  in  some  cases  the  complete  achievable 
rate  region. 

3.1.1  Redefining  Achievable  Rate 

The  nature  of  Byzantine  attack  require  three  modifications  to  the  usual  notion  of 
achievable  rate.  The  first,  as  mentioned  above,  is  that  small  probability  of  error 
is  required  only  for  honest  sources,  even  though  the  decoder  may  not  know  which 
sources  are  honest.  This  requirement  is  reminiscent  of  [5],  in  which  the  lieutenants 
generals  need  only  perform  the  commander  general’s  order  if  the  commander  is 
not  a  traitor,  even  though  the  lieutenants  might  not  be  able  to  decide  this  with 
certainty. 

The  next  modification  is  that  there  must  be  small  probability  of  error  no  matter 
what  the  traitors  do.  This  is  essentially  the  definition  of  Byzantine  attack. 

The  final  modification  has  to  do  with  which  nodes  are  allowed  to  be  traitors. 
Let  “K  be  the  set  of  honest  nodes,  and  T  =  {1,  •  •  •  ,L}\dC  the  set  of  traitors.  A 
statement  that  a  code  achieves  a  certain  rate  must  include  the  list  of  sets  of  nodes 
that  this  code  can  handle  as  the  set  of  traitors.  That  is,  given  such  a  list,  we  say 
that  a  rate  is  achieved  if  there  exists  a  code  with  small  probability  of  error  when 
the  actual  set  of  traitors  is  in  fact  on  the  list.  Hence  a  given  code  may  work  for 
some  lists  and  not  others,  so  the  achievable  rates  will  depend  on  the  specified  list. 
It  will  be  more  convenient  to  specify  not  the  list  of  allowable  sets  of  traitors,  but 
rather  the  list  of  allowable  sets  of  honest  nodes.  We  define  C  ■L'1  to  be  this 
list.  Thus  small  probability  of  error  is  required  only  when  J-C  E  Sj.  One  special 
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case  is  when  the  code  can  handle  any  group  of  at  most  s  traitors.  That  is, 

h  =  j}s  =  {§c{l,-",  rn}  :  |S|  >  L  —  s}. 

Observe  that  achievable  rates  depend  not  just  on  the  true  set  of  traitors  but  also 
on  the  collection  i),  because  the  decoder’s  willingness  to  accept  more  and  more 
different  groups  of  traitors  allows  the  true  traitors  to  get  away  with  more  without 
being  detected.  Thus  we  see  a  trade  off  between  rate  and  security — in  order  to 
handle  more  traitors,  one  needs  to  be  willing  to  accept  a  higher  rate. 

3.1.2  Fixed- Rate  Versus  Variable-Rate  Coding 

In  standard  source  coding,  an  encoder  is  made  up  of  a  single  encoding  function. 
We  will  show  that  this  fixed-rate  setup  is  suboptimal  for  this  problem,  in  the  sense 
that  we  can  achieve  lower  sum  rates  using  variable-rate  coding.  By  variable-rate 
we  mean  that  the  number  of  bits  transmitted  per  source  value  by  a  particular 
node  will  not  be  fixed.  Instead,  the  decoder  chooses  the  rates  at  “run  time”  in  the 
following  way.  Each  node  has  a  finite  number  of  encoding  functions,  all  of  them 
fixed  beforehand,  but  with  potentially  different  output  alphabets.  The  coding  ses¬ 
sion  is  then  made  up  of  a  number  of  transactions.  Each  transaction  begins  with 
the  decoder  deciding  which  node  will  transmit,  and  which  of  its  several  encoding 
functions  it  will  use.  The  node  then  executes  the  chosen  encoding  function  and 
transmits  the  output  back  to  the  decoder.  Finally,  the  decoder  uses  the  received 
message  to  choose  the  next  node  and  encoding  function,  beginning  the  next  trans¬ 
action,  and  so  on.  Thus  a  code  is  made  up  of  a  set  of  encoding  functions  for  each 
node,  a  method  for  the  decoder  to  choose  nodes  and  encoding  functions  based  on 
previously  received  messages,  and  lastly  a  decoding  function  that  takes  all  received 
messages  and  produces  source  estimates. 
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Note  that  the  decoder  has  the  ability  to  transmit  some  information  back  to  the 
nodes,  but  this  feedback  is  limited  to  the  choice  of  encoding  function.  Since  the 
number  of  encoding  functions  need  not  grow  with  the  block  length,  this  represents 
zero  rate  feedback. 

In  variable-rate  coding,  since  the  rates  are  only  decided  upon  during  the  coding 
session,  there  is  no  notion  of  an  L-dimensional  achievable  rate  region.  Instead,  we 
only  discuss  achievable  sum  rates. 

3.1.3  Traitor  Capabilities 

An  important  consideration  with  Byzantine  attack  is  the  information  to  which  the 
traitors  have  access.  First,  we  assume  that  the  traitors  have  complete  knowledge 
of  the  coding  scheme  used  by  the  decoder  and  honest  nodes.  Furthermore,  we 
always  assume  that  they  can  communicate  with  each  other  arbitrarily.  For  variable- 
rate  coding,  they  may  have  any  amount  of  ability  to  eavesdrop  on  transmissions 
between  honest  nodes  and  the  decoder.  We  will  show  that  this  ability  has  no  effect 
on  achievable  rates.  We  assume  with  fixed-rate  coding  that  all  nodes  transmit 
simultaneously,  so  it  does  not  make  sense  that  traitors  could  eavesdrop  on  honest 
nodes’  transmissions  before  making  their  own,  as  that  would  violate  causality. 
Thus  we  assume  for  fixed-rate  coding  that  the  traitors  cannot  eavesdrop. 

The  key  factor,  however,  is  the  extent  to  which  the  traitors  have  direct  access  to 
information  about  the  sources.  We  assume  the  most  general  memoryless  case,  that 
the  traitors  have  access  to  the  random  variable  W,  where  W  is  i.i.d.  distributed 
with  (Yi  •  •  -Yl)  according  to  the  conditional  distribution  r(w\yi  ■  ■  ■ dl )•  A  natural 
assumption  would  be  that  W  always  includes  Yl  for  traitors  i,  but  in  fact  this  need 
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not  be  the  case.  An  important  special  case  is  where  W  =  (Yj ,  •  •  •  ,Yl),  i.e.  the 
traitors  have  perfect  information. 

We  assume  that  the  distribution  of  W  depends  on  who  the  traitors  are,  and 
that  the  decoder  may  not  know  exactly  what  this  distribution  is.  Thus  each  code 
is  associated  with  a  function  fR  that  maps  elements  of  S)  to  sets  of  conditional  dis¬ 
tributions  r.  The  relationship  between  r  and  fR(TC)  is  analogous  to  the  relationship 
between  J~C  and  Sj.  That  is,  given  J~C,  the  code  is  willing  to  accept  all  distributions 
r  G  fk(CK).  Therefore  a  code  is  designed  based  on  and  IR,  and  then  the  achieved 
rate  depends  at  run  time  on  J~C  and  r,  where  we  assume  Kef]  and  r  G  fk(TC). 
We  therefore  discuss  not  achievable  rates  R  but  rather  achievable  rate  functions 
i?(!K,  r).  In  fact,  this  applies  only  to  variable-rate  codes.  In  the  fixed-rate  case,  no 
run  time  rate  decisions  can  be  made,  so  achievable  rates  depend  only  on  Sj  and  IR. 

3.1.4  Main  Results 

Our  main  results  give  explicit  characterizations  of  the  achievable  rates  for  three 
different  setups.  The  first,  which  is  discussed  in  the  most  depth,  is  the  variable- 
rate  case,  for  which  we  characterize  achievable  sum  rate  functions.  The  other  two 
setups  are  for  fixed-rate  coding,  divided  into  deterministic  and  randomized  coding, 
for  which  we  give  L- dimensional  achievable  rate  regions.  We  show  that  randomized 
coding  yields  a  larger  achievable  rate  region  than  deterministic  coding,  but  we 
believe  that  in  most  cases  randomized  fixed-rate  coding  requires  an  unrealistic 
assumption.  In  addition,  even  randomized  fixed-rate  coding  cannot  achieve  the 
same  sum  rates  as  variable-rate  coding. 

We  give  the  exact  solutions  later,  but  describe  here  some  intuition  behind  them. 
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For  variable-rate,  the  achievable  rates,  given  in  Theorem  7,  are  based  on  alternate 
distributions  on  (lj  •  •  •  YL).  Specifically,  given  W,  the  traitors  can  simulate  any 
distribution  q(yjjtv)  to  produce  a  fraudulent  version  of  V™,  then  report  this  se¬ 
quence  as  the  truth.  Suppose  that  the  overall  distribution  q(y1  ■  ■  ■ yL )  governing 
the  combination  of  the  true  value  of  Y$  with  this  fake  value  of  Yj  could  be  pro¬ 
duced  in  several  different  ways,  with  several  different  sets  of  traitors.  In  that 
case,  the  decoder  cannot  tell  which  of  these  several  possibilities  is  the  truth,  which 
means  that  from  its  point  of  view,  many  nodes  might  be  honest.  Since  the  error 
requirement  described  in  3.1.1  stipulates  that  the  decoder  must  produce  a  correct 
estimate  for  every  honest  node,  it  must  attempt  to  decode  the  source  values  asso¬ 
ciated  with  each  potentially  honest  node.  Thus  the  sum  rate  must  be  at  least  the 
joint  entropy,  when  distributed  according  to  q,  of  the  sources  associated  with  all 
potentially  honest  nodes.  The  supremum  over  all  possible  simulated  distributions 
is  the  achievable  sum  rate. 

For  example,  suppose  That  is,  at  most  one  node  is  honest.  Then  the 

traitors  are  able  to  create  the  distribution  q(y i  •  •  -pl)  =  p(y i)  •  •  • p(yL )  no  matter 
which  group  of  L  —  1  nodes  are  the  traitors.  Thus  every  node  appears  as  if  it  could 
be  the  honest  one,  so  the  minimum  achievable  sum  rate  is 

H(Yi)  + - f  H(Yl).  (3.2) 

In  other  words,  the  decoder  must  use  an  independent  source  code  for  each  node, 
which  requires  receiving  nH(Yj )  bits  from  node  i  for  all  i. 

The  achievable  fixed-rate  regions,  given  in  Theorem  8,  are  based  on  the  Slepian- 
Wolf  achievable  rate  region.  For  randomized  fixed-rate  coding,  the  achievable 
region  is  such  that  for  all  S  G  F),  the  rates  associated  with  the  nodes  in  S  fall  into 
the  Slepian-Wolf  rate  region  on  the  corresponding  random  variables.  Note  that 
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for  Sj  =  {{1,  •  •  •  ,  L}},  this  is  identical  to  the  Slepian-Wolf  region.  For  .f)  =  S)l~ i, 
this  region  is  such  that  for  all  i,  Ri  >  which  corresponds  to  the  sum 

rate  in  (3.2).  The  deterministic  fixed-rate  achievable  region  is  a  subset  of  that  of 
randomized  fixed-rate,  but  with  an  additional  constraint  stated  in  Section  3.6. 

3.1.5  Randomization 

Randomization  plays  a  key  role  in  defeating  Byzantine  attacks.  As  we  have 
discussed,  allowing  randomized  encoding  in  the  fixed-rate  situation  expands  the 
achievable  region.  In  addition,  the  variable-rate  coding  scheme  that  we  propose 
relies  heavily  on  randomization  to  achieve  small  probability  of  error.  In  both  fixed 
and  variable-rate  coding,  randomization  is  used  as  follows.  Every  time  a  node 
transmits,  it  randomly  chooses  from  a  group  of  essentially  identical  encoding  func¬ 
tions.  The  index  of  the  chosen  function  is  transmitted  to  the  decoder  along  with 
its  output.  Without  this  randomization,  a  traitor  that  transmits  before  an  honest 
node  i  would  know  exactly  the  messages  that  node  i  will  send.  In  particular,  it 
would  be  able  to  find  fake  sequences  for  node  i  that  would  produce  those  same 
messages.  If  the  traitor  tailors  the  messages  it  sends  to  the  decoder  to  match  one 
of  those  fake  sequences,  when  node  i  then  transmits,  it  would  appear  to  corrobo¬ 
rate  this  fake  sequence,  causing  an  error.  By  randomizing  the  choice  of  encoding 
function,  the  set  of  sequences  producing  the  same  message  is  not  fixed,  so  a  traitor 
can  no  longer  know  with  certainty  that  a  particular  fake  source  sequence  will  re¬ 
sult  in  the  same  messages  by  node  %  as  the  true  one.  This  is  not  unlike  Wyner’s 
wiretap  channel  [28],  in  which  information  is  kept  from  the  wiretapper  by  intro¬ 
ducing  additional  randomness.  See  in  particular  Section  3.5.4  for  the  proof  that 
variable-rate  randomness  can  defeat  the  traitors  in  this  manner. 
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3.1.6  Organization 

The  rest  of  this  chapter  is  organized  as  follows.  In  Section  3.2,  we  develop  in 
detail  the  case  that  there  are  three  nodes  and  one  traitor,  describing  a  coding 
scheme  that  achieves  the  optimum  sum  rate.  In  Section  3.3,  we  formally  give  the 
variable-rate  model  and  present  the  variable-rate  result.  In  Section  3.4,  we  discuss 
the  variable-rate  achievable  rate  region  and  give  an  analytic  formulation  for  the 
minimum  achievable  sum  rate  for  some  special  cases.  In  Section  3.6,  we  give  the 
fixed-rate  models  and  present  the  fixed-rate  result.  In  Sections  3.5  and  3.7,  we 
prove  the  variable-rate  and  fixed-rate  results  respectively. 


3.2  Three  Node  Example 

3.2.1  Potential  Traitor  Techniques 

For  simplicity  and  motivation,  we  first  explore  the  three-node  case  with  one  traitor. 
That  is,  L  =  3  and 

S  =  {{1,2},  {2,  3},  {1,3}}. 

Suppose  also  that  the  traitor  has  access  to  perfect  information  (i.e.  W  = 
(yi,y2,y3)).  Suppose  node  3  is  the  traitor.  Nodes  1  and  2  will  behave  honestly, 
so  they  will  report  Y\  and  Y2  correctly,  as  distributed  according  to  the  marginal 
distribution  p(yiy2 )■  Since  node  3  has  access  to  the  exact  values  of  Y\  and  Y2, 
it  may  simulate  the  conditional  distribution  7j(y3|y2),  then  take  the  resulting  Y3 
sequence  and  report  it  as  the  truth.  Effectively,  then,  the  three  random  variables 
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will  be  distributed  according  to  the  distribution 

^(2/12/22/3)  -  p(yiV2)p(y3\y2)- 

The  decoder  will  be  able  to  determine  that  nodes  1  and  2  are  reporting  jointly 
typical  sequences,  as  are  nodes  2  and  3,  but  not  nodes  1  and  3.  Therefore,  it  can 
tell  that  either  node  1  or  3  is  the  traitor,  but  not  which  one,  so  it  must  obtain 
estimates  of  the  sources  from  all  three  nodes.  Since  the  three  streams  are  not 
jointly  typical  with  respect  to  the  source  distribution  p(yiy2y3),  standard  Slepian- 
Wolf  coding  on  three  encoders  will  not  correctly  decode  them  all.  However,  had  we 
known  the  strategy  of  the  traitor,  we  could  do  Slepian-Wolf  coding  with  respect 
to  the  distribution  q.  This  will  take  a  sum  rate  of 


Hq(YxY2Y:i)  =  H(YxY2Y,)  +  J(Yi;  Y3\Y2) 

where  Hq  is  the  entropy  with  respect  to  q.  In  fact  we  will  not  do  Slepian-Wolf 
coding  with  respect  to  q  but  rather  something  slightly  different  that  gives  the 
same  rate.  Since  Slepian-Wolf  coding  without  traitors  can  achieve  a  sum  rate  of 
H(YiY2Y3),  we  have  paid  a  penalty  of  I(Y\,  Y3IY2)  for  the  single  traitor. 

We  supposed  that  node  3  simulated  the  distribution  p(y3\y2).  It  could  have  just 
as  easily  simulated  p(?/3 12/1) ,  °r  another  node  could  have  been  the  traitor.  Hence, 
the  minimum  achievable  sum  rate  for  all  J~C  G  S)  is  at  least 

R*  ±  H(YlY2Y3)  +  max{/(W;  Y2\YS),  J(Yi;  Y3\Y2),  I(Y2 ;  Y3|*i)}.  (3.3) 

In  fact,  this  is  exactly  the  minimum  achievable  sum  rate,  as  shown  below. 
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3.2.2  Variable-Rate  Coding  Scheme 

We  now  give  a  variable-rate  coding  scheme  that  achieves  R*.  This  scheme  is 
somewhat  different  from  the  one  we  present  for  the  general  case  in  Section  3.5, 
but  it  is  much  simpler,  and  it  illustrates  the  basic  idea.  The  procedure  will  be 
made  up  of  a  number  of  rounds.  Communication  from  node  i  in  the  first  round 
will  be  based  solely  on  the  first  n  values  of  Yl}  in  the  second  round  on  the  second 
n  values  of  Yl,  and  so  on.  The  principle  advantage  of  the  round  structure  is  that 
the  decoder  may  hold  onto  information  that  is  carried  over  from  one  round  to  the 
next. 

In  particular,  the  decoder  maintains  a  collection  03  C  b  representing  the  sets 
that  could  be  the  set  of  honest  nodes.  If  a  node  is  completely  eliminated  from  03, 
that  means  it  has  been  identified  as  the  traitor.  We  begin  with  03  =  Sj,  and  then 
remove  a  set  from  03  whenever  we  find  that  the  messages  from  the  corresponding 
pair  of  nodes  are  not  jointly  typical.  With  high  probability,  the  two  honest  nodes 
report  jointly  typical  sequences,  so  we  expect  never  to  eliminate  the  honest  pair 
from  03.  If  the  traitor  employs  the  q  discussed  above,  for  example,  we  would 
expect  nodes  1  and  3  to  report  atypical  sequences,  so  we  will  drop  {1,3}  from  03. 
In  essence,  the  value  of  03  contains  our  current  knowledge  about  what  the  traitor 
is  doing. 

The  procedure  for  a  round  is  as  follows.  If  03  contains  {{1,  2},  {1,  3}},  do  the 
following: 

1.  Receive  nH(Y f)  bits  from  node  1  and  decode  yj\ 

2.  Receive  nH(Y2\Yi)  bits  from  node  2.  If  there  is  a  sequence  in  y%  jointly 
typical  with  y\l 2  that  matches  this  transmission,  decode  that  sequence  to  y% . 
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If  not,  receive  nI(Yi,  Y2)  additional  bits  from  node  2,  decode  y%,  and  remove 
{1,2}  from  03. 

3.  Do  the  same  with  node  3:  Receive  riH(Y:i\Yi)  bits  and  decode  yj  if  possible. 
If  not,  receive  nI(Yi,Y3)  additional  bits,  decode,  and  remove  {1,3}  from  03. 

If  03  is  one  of  the  other  two  subsets  of  f)  with  two  elements,  perform  the  same 
procedure  but  replace  node  1  with  whichever  node  appears  in  both  elements  in 
03.  If  03  contains  just  one  element,  then  we  have  exactly  identified  the  traitor,  so 
ignore  the  node  that  does  not  appear  and  simply  do  Slepian-Wolf  coding  on  the 
two  remaining  nodes. 

Note  that  the  only  cases  when  the  number  of  bits  transmitted  exceeds  nR*  are 
when  we  receive  a  second  message  from  one  of  the  nodes,  which  happens  exactly 
when  we  eliminate  an  element  from  03.  Assuming  the  source  sequences  of  the  two 
honest  nodes  are  jointly  typical,  this  can  occur  at  most  twice,  so  we  can  always 
achieve  a  sum  rate  of  R*  when  averaged  over  enough  rounds. 

3.2.3  Fixed-Rate  Coding  Scheme 

In  the  procedure  described  above,  the  number  of  bits  sent  by  a  node  changes  from 
round  to  round.  We  can  no  longer  do  this  with  fixed-rate  coding,  so  we  need 
a  different  approach.  Suppose  node  3  is  the  traitor.  It  could  perform  a  black 
hole  attack,  in  which  case  the  estimates  for  Y{n  and  Y2  must  be  based  only  on  the 
messages  from  nodes  1  and  2.  Thus,  the  rates  R\  and  R-2  must  fall  into  the  Slepian- 
Wolf  achievability  region  for  kj  and  Y2.  Similarly,  if  one  of  the  other  nodes  was  the 
traitor,  the  other  pairs  of  rates  also  must  fall  into  the  corresponding  Slepian-Wolf 
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region.  Putting  these  conditions  together  gives 

Ri  >max{F(y1|y2),^(yi|y3)} 
R2>m^{H(Y2\Y1),H(Y2\Y3)} 
R3>m^{H(Y3\Y1),H(Y3\Y2)} 

(3.4) 

Ri  +  R-2  >  H(Y\Y2) 

R3  +  R3>  H(Y1Y3) 

R2  +  R3>  H{Y2Y3). 

If  the  rates  fall  into  this  region,  we  can  do  three  simultaneous  Slepian-Wolf  codes, 
one  on  each  pair  of  nodes,  thereby  constructing  two  estimates  for  each  node.  If 
we  randomize  these  codes  using  the  method  described  in  Section  3.1.5,  the  traitor 
will  be  forced  either  to  report  the  true  message,  or  report  a  false  message,  which 
with  high  probability  will  be  detected  as  such.  Thus  either  the  two  estimates  for 
each  node  will  be  the  same,  in  which  case  we  know  both  are  correct,  or  one  of  the 
estimates  will  be  demonstrably  false,  in  which  case  the  other  is  correct. 


We  now  show  that  the  region  given  by  (3.4)  does  not  include  sum  rates  as  low 
as  R*.  Assume  without  loss  of  generality  that  I {Y\ ;  Y2\Y3)  achieves  the  maximum 
in  (3.3).  Summing  the  last  three  conditions  in  (3.4)  gives 

Ri  +  R2  +  R3  >  \ (H(YiY2)  +  H(YxY3)  +  H(Y2Y3)) 

=  h(y1y2y3)  +  i(/(yi;y2|y3)  +  /(y1y2;y3)).  (3.5) 

If  I(YiY2;Y3)  >  I(Y1]Y2\Y3),  (3.5)  is  larger  than  (3.3).  Hence,  there  exist  source 
distributions  for  which  we  cannot  achieve  the  same  sum  rates  with  even  randomized 
fixed-rate  coding  as  with  variable-rate  coding. 

If  we  are  interested  only  in  deterministic  codes,  the  region  given  by  (3.4)  can  no 
longer  be  achieved.  In  fact,  we  will  prove  in  Section  3.7  that  the  achievable  region 
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reduces  to  the  trivially  achievable  region  where  R{  >  H(Yj)  for  all  i  when  L  =  3, 
though  it  is  nontrivial  for  L  >  3.  For  example,  suppose  L  —  4  and  Sj  —  bi-  In  this 
case,  the  achievable  region  is  similar  to  that  given  by  (3.4),  but  with  an  additional 
node.  That  is,  each  of  the  6  pairs  of  rates  must  fall  into  the  corresponding  Slepian- 
Wolf  region.  In  this  case,  we  do  three  simultaneous  Slepian-Wolf  codes  for  each 
node,  construct  three  estimates,  each  associated  with  one  of  the  other  nodes.  For 
an  honest  node,  only  one  of  the  other  nodes  could  be  a  traitor,  so  at  least  two  of 
these  estimates  must  be  correct.  Thus  we  need  only  take  the  plurality  of  the  three 
estimates  to  obtain  the  correct  estimate. 


3.3  Variable-Rate  Model  and  Result 

3.3.1  Notation 

Let  Y%  be  the  random  variable  revealed  to  node  i,  V*  the  alphabet  of  that  vari¬ 
able,  and  yi  a  corresponding  realization.  A  sequence  of  random  variables  revealed 
to  node  i  over  n  timeslots  is  denoted  Y™,  and  a  realization  of  it  y”  e  y™.  Let 
M  =  {1,  •  •  •  ,  L}.  For  a  set  S  C  M,  let  Y§  be  the  set  of  random  variables  {Y)}je§, 
and  define  y§  and  similarly.  By  Sc  we  mean  M\S.  Let  T™(Y§)[q]  be  the  strongly 
typical  set  with  respect  to  the  distribution  q,  or  the  source  distribution  p  if  un¬ 
specified.  Similarly,  Hq(Y§)  is  the  entropy  with  respect  to  the  distribution  q,  or  p 
if  unspecified. 
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3.3.2  Communication  Protocol 

The  transmission  protocol  is  composed  of  T  transactions.  In  each  transaction,  the 
decoder  selects  a  node  to  receive  information  from  and  selects  which  of  K  encod¬ 
ing  functions  it  should  use.  The  node  then  responds  by  executing  that  encoding 
function  and  transmitting  its  output  back  to  the  decoder,  which  then  uses  the  new 
information  to  begin  the  next  transaction. 

For  each  node  i  6  M  and  encoding  function  j  G  {1,  •  •  •  ,  A'},  there  is  an  as¬ 
sociated  rate  Rt.j.  On  the  fth  transaction,  let  it  be  the  node  and  jt  the  encoding 
function  chosen  by  the  decoder,  and  let  ht  be  the  number  of  t'  G  {1,  •  •  •  ,  t  —  1} 
such  that  if  =  it.  That  is,  ht  is  the  number  of  times  it  has  transmitted  prior  to  the 
tth  transaction.  Note  that  it,jt,  ht  are  random  variables,  since  they  are  chosen  by 
the  decoder  based  on  messages  it  has  received,  which  depend  on  the  source  values. 
The  jth  encoding  function  for  node  i  is  given  by 

fi,j  :  X  x  2-  x  {1,  •  •  •  ,  K}ht  -G  {1,  •  •  •  ,  2nRi’j }  (3.6) 

where  Z  represents  randomness  generated  at  the  node.  Let  It  G  {1,  •  •  *  ,  2nRit>«} 
be  the  message  received  by  the  decoder  in  the  tth  transaction.  If  it  is  hon¬ 
est,  then  It  =  fit,jt(Y£,  Ph,  Jt),  where  pH  G  Z  is  the  randomness  from  node  it 
and  Jt  G  {1,  •••  ,  K}ht  is  the  history  of  encoding  functions  used  by  node  it  so 
far.  If  it  is  a  traitor,  however,  it  may  choose  It  based  on  Wn  and  it  may  have 
any  amount  of  access  to  previous  transmissions  I\,  ■  ■  ■  ,It-i  and  polling  history 
ii,--'  ,  ■  ■  ■  ,jt- 1-  But,  it  does  not  have  access  to  the  randomness  pt  for  any 

honest  node  i.  Note  again  that  the  amount  of  traitor  eavesdropping  ability  has  no 
effect  on  achievable  rates. 

After  the  decoder  receives  It,  if  t  <  L  it  uses  I\,  ■  ■  ■  ,  It  to  choose  the  next  node 
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it+ 1  and  its  encoding  function  index  jt+i-  After  the  Tth  transaction,  it  decodes 
according  to  the  decoding  function 

T 

g  :  JJ{1,  •  •  •  ,  2nRit’jt }  ->•  Ti  x  •  •  •  x  Tl- 

t= 1 

Note  that  we  impose  no  restriction  whatsoever  on  the  size  of  the  total  number 
of  transactions  T .  Thus,  a  code  could  have  arbitrary  complexity  in  terms  of  the 
number  of  messages  passed  between  the  nodes  and  the  decoder.  However,  in  our 
below  definition  of  achievability,  we  require  that  the  communication  rate  from 
nodes  to  decoder  always  exceeds  that  from  decoder  to  nodes.  Therefore  while  the 
number  of  messages  may  be  very  large,  the  amount  of  feedback  is  diminishingly 
small. 


3.3.3  Variable-Rate  Problem  Statement  and  Main  Result 

Let  CK  C  M  be  the  set  of  honest  nodes.  Define  the  probability  of  error 

pe  4  Pr  (y»  ^  y») 

where  (Y”,  •  •  •  ,  Y£)  =  g(I\,  ■  ■  ■  ,  It)-  The  probability  of  error  will  in  general  de¬ 
pend  on  the  actions  of  the  traitors.  Note  again  that  we  only  require  small  proba¬ 
bility  of  error  on  the  source  estimates  corresponding  to  the  honest  nodes. 

We  define  a  rate  function  R(JK,  r)  defined  for  J-C  G  b  and  r  G  Jl(J~C)  to  be 
a-achievable  if  there  exists  a  code  such  that,  for  all  pairs  (TC,  r)  and  any  choice  of 
actions  by  the  traitors,  Pe  <  a, 

Pr  (  r))  -  1  “  01 

k  t=  1  ' 

and  log  AT  <  anR^j  for  all  i,j.  This  last  condition  requires,  as  discussed  above, 
that  the  feedback  rate  from  the  decoder  back  to  the  nodes  is  arbitrarily  small 
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compared  to  the  forward  rate.  A  rate  function  R^K,  r )  is  achievable  if  for  all 
a  >  0,  there  is  a  sequence  of  a-achievablc  rate  functions  {R'k{‘K1r)}(^=l  such  that 

lim  R'k(K,r)  =  R(Ji,r). 

k—>oo 

Note  that  we  do  not  require  uniform  convergence. 

The  following  definitions  allow  us  to  state  our  main  variable-rate  result.  For 
any  J~C  G  Sj  and  r  G  R(0K),  let  f{w\yx)  be  the  distribution  of  W  given  Yx  when  W 
is  distributed  according  to  r{w\yu).  That  is, 

f(w\yx)  =  P(yw\yx)r(w\yHyw)- 

The  extent  to  which  W  provides  information  about  Y^c  is  irrelevant  to  the  traitors, 
since  in  order  to  fool  the  decoder  they  must  generate  information  that  appears  to 
agree  only  with  Yx  as  reported  by  the  honest  nodes.  Thus  it  will  usually  be  more 
convenient  to  work  with  f  rather  than  r.  For  any  S  G  F)  and  r'  G  1R(S),  let 

Qs y  =  lp(y§)^2r'(w\y§)q(y$c\w)  :  Vg(4/Sc|u;)|.  (3.7) 

If  Sc  were  the  traitors  and  W  were  distributed  according  to  r',  then  Q would 
be  the  set  of  distributions  q  to  which  the  traitors  would  have  access.  That  is,  if 
they  simulate  the  proper  q(y$c\w)  from  their  received  W,  this  simulated  version  of 
Y§  and  the  true  value  of  Y§c  would  be  jointly  distributed  according  to  q.  For  any 
TJ  C  F),  define 

Q(®)  A  Pl  [J  Qgy, 

Se'Ur'G^(S) 

U(QJ)  =  (J  S. 

se® 

That  is,  for  some  distribution  q  G  Q(QJ),  for  every  S  G  TJ,  if  the  traitors  were  Sc, 
they  would  have  access  to  q  for  some  r'  G  fk(S).  Thus  any  distribution  in  Q(QJ) 
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makes  it  look  to  the  decoder  like  any  S  G  2J  could  be  the  set  of  honest  nodes,  so 
any  node  in  i  G  U(Q3)  is  potentially  honest. 

Theorem  7  A  rate  function  R(J-C,r )  is  achievable  if  and  only  if,  for  all  (df,  r), 
R(H,r)>R*(H,r)±  sup  Hq{Yu(v)).  (3.8) 

3JCS,  gGQM,rnQ(Ql) 

See  Section  3.5  for  the  proof. 

We  offer  the  following  interpretation  of  this  result.  Suppose  we  placed  the 
following  constraint  on  the  traitors’  behavior.  Given  Wn,  they  must  produce  a 
value  of  Yf1  in  an  i.i.d.  fashion,  then  report  it  as  the  truth.  That  is,  they  choose  a 

value  of  Yj  at  time  r  based  only  on  W  at  time  r,  making  each  choice  in  an  identical 

manner.  Then  each  traitor  i  takes  the  produced  value  of  Yf  and  behaves  for  the 
duration  of  the  coding  session  exactly  as  if  it  were  honest  and  this  was  the  true 
source  sequence.  We  can  now  easily  classify  all  possible  behaviors  of  the  traitors 
simply  by  specifying  the  manner  in  which  they  generate  Yj  from  W,  which  is  given 
by  some  distribution  q(y? |w).  The  joint  distribution  of  Y^  and  Yj  will  be  given  by 

q(y  m)  =  p{y^i)S^Jf{w\yy()q(y7\w).  (3.9) 

w 

By  (3.7),  q  G  Q:h>-  If  q  is  also  contained  in  Qgy  for  some  S  G  $]  and  r'  G  1R(S), 
then  again  by  (3.7),  there  exists  a  distribution  q'(y§\w )  such  that 

q{yu)  =  p(y§)  (3-10) 

w 

Since  (3.9)  and  (3.10)  have  exactly  the  same  form,  the  decoder  will  not  be  able 
to  determine  whether  dt  is  the  set  of  honest  nodes  with  W  distributed  according 
to  r,  or  S  is  the  set  of  honest  nodes  with  W  distributed  according  to  r'.  On  the 
other  hand,  if  for  some  S  G  W  q  ^  Q for  all  r1  G  !R(S),  then  the  decoder  should 
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be  able  to  tell  that  S  is  not  the  set  of  honest  nodes.  We  have  not  yet  said  how  it 
might  know,  but  intuition  suggests  that  it  should  be  possible.  Hence,  if  there  is 
no  S  containing  a  certain  node  i  for  which 

qe  U  Qs,r'  (3.11) 

r'eX(S) 

then  the  decoder  can  be  sure  that  i  is  a  traitor  and  it  may  be  ignored.  Let  03  be 
the  collection  of  all  S  E  Sj  for  which  (3.11)  holds.  Every  node  in  11(03)  looks  to 
the  decoder  like  it  could  be  honest;  all  the  rest  are  surely  traitors.  Thus,  in  order 
to  make  sure  that  the  decoder  reconstructs  honest  information  perfectly,  it  must 
recover  Ytn  for  all  i  E  11(03),  which  means  the  sum  rate  must  be  at  least  Hq(Yu(< jj)). 
Observe  that 

q  e  fl  U  Q^'  =  Q(2J)- 

SeQJr'e3J(S) 

As  already  noted,  q  E  □:«>,  so  q  E  Qjgr  O  Q(0J).  Moreover,  for  any  03  C  1),  every 
element  of  r  0  Q(03)  can  be  produced  with  the  proper  choice  of  q(y7\w).  Hence 
Hq{Yuw)  can  be  as  high  as 

sup  Hq(Yu^))  =  R*(H,r) 

but  no  higher.  Thus  it  makes  sense  that  this  rate  and  no  better  can  be  achieved  if 
we  place  this  constraint  on  the  traitors.  Therefore  Theorem  7  can  be  interpreted 
as  stating  that  constraining  the  traitors  in  this  manner  has  no  effect  on  the  set  of 
achievable  rates. 


3.4  Properties  of  the  Variable-Rate  Region 

It  might  at  first  appear  that  (3.8)  does  not  agree  with  (3.3).  We  discuss  several 
ways  in  which  (3.8)  can  be  made  more  manageable,  particularly  in  the  case  of 
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perfect  traitor  information  (i.e.  W  =  1m),  and  show  that  the  two  are  in  fact 
identical.  Let  R*  be  the  minimum  rate  achievable  over  all  J~C  G  1)  and  r  G 
Thus  by  (3.8),  we  can  write 

R*  =  sup  R*{fK,r)  =  sup  Hq(Yu(t0)).  (3.12) 

KeSi,re3i{X)  0105,  <jeQ(OI) 

This  is  the  quantity  that  appears  in  (3.3).  Note  also  that  for  perfect  traitor  infor¬ 
mation, 

Qsy  =  {q(y m)  :  q(y&)  =  p(y §)}•  (3.13) 

This  means  that  Q^r  D  Q(QJ)  =  Q(QJU  {df}).  Therefore  (3.8)  becomes 

R*(fK,r)  =  sup  Hq(Yu{<n)). 

arorweor,  geQ(or) 

The  following  lemma  simplihes  calculation  of  expressions  of  the  form 
SUPgGQ(21)  Hq(YU(<X})). 

Lemma  7  Suppose  the  traitors  have  perfect  information.  For  any  QJ  C  id,  the 
expression 

sup  Hq(Yum)  (3.14) 

?eQ(OJ) 

is  maximized  by  a  q  satisfying  (3.13)  for  all  S  E  QJ  such  that,  for  some  set  of 
functions  {erg }§<=*£, 

q(yi---VL)  =  Y[cr§(y§).  (3.15) 

seai 

Proof:  By  (3.13),  we  need  to  maximize  Hq(Yu (<jj))  subject  to  the  constraints  that 
for  each  S  e  QJ  and  all  y§  e  q(y§)  =  p(y§)-  This  amounts  to  maximizing  the 
Lagrangian 

a  =  -  Q(yu<w)  losq(yu^))  +  As(2/s)(g(?/s)  -p(v '«))- 

yu(5j)Gyu(aj)  Seal  ys&^s 


130 


147 


Note  that  for  any  S  C  11(93), 

dgjys)  =  1 
dq(yu(<33)) 

Thus,  differentiating  with  respect  to  q{yu(v3))  gives,  assuming  the  log  is  a  natural 
logarithm, 


dA 


dq(yu(X 

Setting  this  to  0  gives 


-  =  -  log q(yu(2}))  ~  1  +  A§(^)- 

seai 


q(yu{ »))  =  exp  (  -  1  +  \$(y$)  )  =  |Vu(9J)c I  n  crs(2/s 


sg® 


Sg® 


for  some  set  of  functions  {crg}gGQj.  Therefore  setting 

q(yu(Q3)) 


q(yi---yL)  = 


\^U{<S)c 


satisfies  (3.15),  so  if  a§  are  such  that  (3.13)  is  satisfied  for  all  S  €  93,  q  will  maximize 
Hq(Ym)).  □ 


Suppose  L  =  3  and  S)  —  #i.  If  93  =  {{1, 2},  {2, 3}},  then  q(yiy2y3)  = 
p(?/i2/2)p(2/3 I2/2)  is  in  Q(93)  and  by  Lemma  7  maximizes  Hq{Y{Y-2Y3)  over  all 
q  G  Q(93).  Thus 


sup  Hq(YxY2Y:i)  =  H^Y^)  =  H(Y1Y2Y3)  +  I(Yi;  Y,\Y2). 

qGQ(2J) 

By  similar  reasoning,  considering  93  =  {{1,  2},  {1,  3}}  and  93  =  {{1,  3},  {2,  3}} 
results  in  (3.3).  Note  that  if  93 1  C  932,  then  Q(93i)  D  Q(932),  so  932  need  not  be 
considered  in  evaluating  (3.8).  Thus  we  have  ignored  larger  subsets  of  Jdi,  since 
the  value  they  give  would  be  no  greater  than  the  others. 


We  can  generalize  to  any  collection  93  of  the  form 


{{Si,  S2},  {§i,  S3},  •  •  •  j  {Si,  §k}} 
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in  which  case 


sup  =  H(Yg1Y§2)  +  H(Y§3\Y§1)  +  •  •  •  +  H(YSk\YSl). 

q£Q  (2J) 

Employing  this,  we  can  rewrite  (3.12)  for  S)  =  f)s  and  certain  values  of  s.  For 
s  —  1,  it  becomes 


R*  =  H (Y\  ■  ■  ■  Yl)  +  max  I(Yt]  W|F{mT). 

Again,  relative  to  the  Slepian-Wolf  result,  we  always  pay  a  conditional  mutual 
information  penalty  for  a  single  traitor.  For  s  =  2, 


R*  =  H(Y1---Yl) 


+  max  \  max  J(  Y§ ;  Y§/ 1 Y(Sus')c ) , 

(S,S'CH:|S|  =  |S'|=2  V  ’ 


max 


where  J(X;  F;  Z|W)  =  +  i/(F|lF)  +  H(Z\W)  -  H(XYZ\W).  For  s  = 

L  —  1,  R*  is  given  by  (3.2).  There  is  a  similar  formulation  for  s  =  L  —  2,  though 
it  is  more  difficult  to  write  down  for  arbitrary  L. 


With  all  these  expressions  made  up  of  nothing  but  entropies  and  mutual  in¬ 
formations,  it  might  seem  hopeful  that  (3.14)  can  be  reduced  to  such  an  ana¬ 
lytic  expression  for  all  QJ.  However,  this  is  not  the  case.  For  example,  consider 
03  =  {{1,  2,  3},  {3, 4,  5},  {5,  6, 1}}.  This  03  is  irreducible  in  the  sense  that  there  is 
no  subset  03'  that  still  satisfies  11(03')  =  {1,  •  •  •  ,6},  but  there  is  no  simple  distri¬ 
bution  q  G  Q(03)  made  up  of  marginals  of  p  that  satisfies  Lemma  7,  so  it  must  be 
found  numerically.  Still,  Lemma  7  simplifies  the  calculation  considerably. 
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3.5  Proof  of  Theorem  7 

3.5.1  Converse 

We  first  show  the  converse.  Fix  IH  6  f]  and  r  G  3?(3T).  Take  any  03  C  f),  and 
any  distribution  q  G  Qjyr  H  Q(Q3).  Since  q  G  there  is  some  q(yj\w)  such 

that  Yji  and  Yj  are  distributed  according  to  q.  Since  also  q  G  Q§;r/  for  all  S  G  03 
and  some  r'  G  fR(S),  if  the  traitors  simulate  this  q  and  act  honestly  with  these 
fabricated  source  values,  the  decoder  will  not  be  able  to  determine  which  of  the 
sets  in  03  is  the  actual  set  of  honest  nodes.  Thus,  the  decoder  must  perfectly 
decode  the  sources  from  all  nodes  in  11(03),  so  if  i?(3i,  r)  is  a  precisely  ce-achievable 
rate  function,  R(“K,r)  >  Hq(Yu^). 

3.5.2  Achievability  Preliminaries 

Now  we  prove  achievability.  To  do  so,  we  will  first  need  the  theory  of  types.  Given 
yn  G  yn,  let  t(yn)  be  the  type  of  yn.  Given  a  type  t  with  denominator  n,  let 
A t(Y)  be  the  set  of  all  sequences  in  yn  with  type  t.  If  t  is  a  joint  y,z  type  with 
denominator  n,  then  let  Artl(Y\zn)  be  the  set  of  sequences  yn  G  yn  such  that  ( ynzn ) 
have  joint  type  t,  with  the  convention  that  this  set  is  empty  if  the  type  of  zn  is  not 
the  marginal  of  t. 

We  will  also  need  the  following  definitions.  Given  a  distribution  q  on  an  alpha¬ 
bet  y,  define  the  77-ball  of  distributions 

Bv{q)  =  <J '  q\y)  ■■  Vy  G  y  :  \  q(y)  -  q'(y)\  <  ^jj. 
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Note  that  the  typical  set  can  be  written 

T?{Y)  =  {yn  :  t(yn)  G  Be(p)}. 

We  define  slightly  modified  versions  of  the  sets  of  distributions  from  Section  3.3.3 
as  follows: 

aiy=  u 

q€Qsy 

Q”(2J)  =  f|  U 

seair'ea t(§) 

These  sets  are  nearly  the  same  as  those  defined  earlier.  We  will  eventually  take  the 
limit  as  rj  — *  0,  making  them  identical  to  Q§y  and  Q(QJ),  but  it  will  be  necessary 
to  have  slightly  expanded  versions  for  use  with  finite  block  length. 

Finally,  we  will  need  the  following  lemma. 

Lemma  8  Given  an  arbitrary  n  length  distribution  qn(yn )  and  a  type  t  with  de¬ 
nominator  n  on  y ,  let  Qi{y)  be  the  marginal  distribution  of  q11  at  time  i  and  q(y)  = 
n  Ym=\  (h(y)-  If  Yn  is  distributed  according  to  qn  and  Pr(Wn  G  A f(Y))  >  2~m », 
then  D(t\\q)  <  (. 

Proof:  Fix  an  integer  h.  For  i  —  1,  •  •  •  ,  n,  let  Yn{i )  be  independently  generated 
from  qn.  Let  T  be  the  set  of  types  tn  on  supersymbols  in  yn  with  denominator  n 
such  that  tn(yn )  =  0  if  yn  ^  A"(y).  Note  that 

|r|  <  (n  +  l)lyl”. 

If  Ynn  =  (A'n(l),  •  •  •  ,  Yn(h)),  then 

Pr  (Ynh  G  |J  A %(Yn)}  =  Pr(Fn(i)  G  A”(L),W) 
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But 

Pr  (y nh  e  1J  A”  (yn))  =  J2  Pr(yn”  e  K4Yn) 

t.n£Tn  tner 

<  J2  2~hD(tnh7l) 

tne  r 

<  (n  +  i)iyin2-”min‘n6rD(tn||«n). 

For  any  fn  G  T,  letting  L  be  the  marginal  type  at  time  i  gives  p  L  =  t. 
Therefore 

C  +  -4|y|"  log(A  +  !)  >  min  -£(*!<?”) 
run  tner  n 

1  n 

>  min  -  Y]  D{ti\\qi)  (3.16) 

t«er  n  ^  v  "  v  ' 

Z=1 

>  D(t ||f)  (3.17) 

where  (3.16)  holds  by  [91,  Lemma  4.3]  and  (3.17)  by  convexity  of  the  Kullback- 
Leibler  distance  in  both  arguments.  Letting  h  grow  proves  the  lemma.  □ 

The  achievability  proof  proceeds  as  follows.  Section  3.5.3  describes  our  pro¬ 
posed  coding  scheme  for  the  case  that  traitors  cannot  eavesdrop.  In  Section  3.5.4, 
we  demonstrate  that  this  coding  scheme  achieves  small  probability  of  error  when 
the  traitors  have  perfect  information.  Section  3.5.5  shows  that  the  coding  scheme 
achieves  the  rate  function  R*(“K,  r).  In  Section  3.5.6,  we  extend  the  proof  to  in¬ 
clude  the  case  that  the  traitors  have  imperfect  information.  Finally,  Section  3.5.7 
gives  a  modification  to  the  coding  scheme  that  can  handle  eavesdropping  traitors. 

3.5.3  Coding  Scheme  Procedure 

Our  basic  coding  strategy  is  for  a  node  to  transmit  a  sequence  of  small  messages 
to  the  decoder  until  the  decoder  has  received  enough  information  to  decode  the 
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node’s  source  sequence.  After  receiving  one  of  these  messages,  the  decoder  asks 
for  another  small  message  only  if  it  is  unable  to  decode  the  sequence.  If  it  can,  the 
decoder  moves  on  to  the  next  node.  This  way,  the  rate  at  which  a  node  transmits 
is  as  small  as  possible.  Once  each  node’s  source  sequence  has  been  decoded,  the 
decoder  attempts  to  use  them  to  accumulate  information  about  which  nodes  could 
be  traitors.  It  is  in  this  step  that  it  uses  its  knowledge  of  the  power  of  the  traitors  to 
tell  the  difference  between  a  node  that  could  be  honest  under  some  circumstances 
and  one  that  is  surely  a  traitor.  After  this,  the  decoder  goes  back  across  all  the 
nodes  again,  repeating  the  same  procedure  for  the  next  block  of  source  values  and 
ignoring  those  nodes  that  it  knows  to  be  traitors.  The  decoder  repeats  this  again 
and  again,  gathering  more  information  about  which  nodes  could  be  traitors  each 
time.  The  precise  description  of  the  coding  strategy  follows. 

1 )  Random  Code  Structure:  Fix  e  >  0.  The  maximum  number  of  small  mes¬ 
sages  that  could  be  sent  by  node  i  when  transmitting  a  certain  sequence  to  the 
decoder  is  Jt  =  hliiii  _  Each  of  these  small  messages  is  represented  by  a  function 
to  be  defined,  taking  the  source  sequence  as  input  and  producing  the  small  message 
as  output.  In  addition,  as  we  discussed  in  3.1.5,  it  is  necessary  to  randomize  the 
messages  at  run  time  in  order  to  defeat  the  traitors.  Thus,  node  i  has  C  different 
but  identically  created  subcodebooks,  each  of  which  is  made  up  of  a  sequence  of  Jt 
functions,  one  for  each  small  messages,  where  C  is  an  integer  to  be  defined.  Hence 
the  full  codebook  for  node  i  is  composed  of  CJi  separate  functions.  In  particular, 
for  i  —  1,  •  •  •  ,  L  and  c  =  1,  •  •  •  ,  C,  let 

he, 

/UnV-»{v--> 2”}, 

with  v  to  defined  later.  Thus,  a  subcodebook  associates  with  each  element  of 
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y™  a  sequence  of  about  n(log  \y^\  +  u)  bits  chopped  into  small  messages  of  length 
n(e  +  u)  or  ne.  We  put  tildes  on  these  functions  to  distinguish  them  from  the 
/ s  defined  in  (3.6).  The  / s  that  we  define  here  are  functions  we  use  as  pieces  of 
the  overall  encoding  functions  /.  Each  one  is  constructed  by  a  uniform  random 
binning  procedure.  Define  composite  functions 


Fi,cAy?)  =  (kcAy?)r--rf\c,j(y?))- 


We  can  think  of  7q.CJ(y”)  as  an  index  of  one  of  2n^e+v)  random  bins. 


2)  Round  Method:  Our  coding  scheme  is  made  up  of  N  rounds,  with  each  round 
composed  of  m  phases.  In  the  ith  phase,  transactions  are  made  entirely  with  node 
i.  We  denote  Y™(I)  as  the  Jth  block  of  n  source  values,  but  for  convenience,  we 
will  not  include  the  index  7  when  it  is  clear  from  context.  As  in  the  three- node 
example,  all  transactions  in  the  7th  round  are  based  only  on  1^(7).  Thus  the  total 
block  length  is  Nn. 


The  procedure  for  each  round  is  identical  except  for  the  variable  03(7)  main¬ 
tained  by  the  decoder.  This  represents  the  collection  of  sets  that  could  be  the 
set  of  honest  nodes  based  on  the  information  the  decoder  has  received  as  of  the 
beginning  of  round  7.  The  decoder  begins  by  setting  E(l)  =  S)  and  then  pares  it 
down  at  the  end  of  each  round  based  on  new  information. 


3)  Encoding  and  Decoding  Rules:  In  the  7th  phase,  if  i  G  11(07(7)),  the  decoder 
makes  a  number  of  transactions  with  node  i  and  produces  an  estimate  Y J1  of  Y J1. 
The  estimate  Y-1  is  of  course  a  random  variable,  so  as  usual  the  lower  case  y ™  refers 
to  a  realization  of  this  variable.  If  i  (jL  11(03(7)),  then  the  decoder  has  determined 
that  node  i  cannot  be  honest,  so  it  does  not  communicate  with  it  and  sets  y™  to  a 
null  value. 
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For  i  G  U(QJ(I)),  at  the  beginning  of  phase  i,  node  i  randomly  selects  a  c  G 
{1,  •  •  •  ,  C}  according  to  the  uniform  distribution.  In  the  first  transaction,  node 
i  transmits  (c,  /i,c,i0'71))-  That  is,  along  with  the  small  message  itself,  the  node 
transmits  the  randomly  selected  index  c  of  the  subcodebook  that  it  will  use  in  this 
phase.  As  the  phase  continues,  in  the  j tli  transaction,  node  i  transmits  fi,c,j0^in)- 


After  each  transaction,  the  decoder  must  decide  whether  to  ask  for  another 
transaction  with  node  i,  and  if  not,  to  decode  Y in.  In  the  random  binning  proof 
approach  to  the  traditional  Slepian-Wolf  problem,  the  decoder  decides  which  se¬ 
quence  in  the  received  bin  to  select  as  the  source  estimate  by  taking  the  one 
contained  in  the  typical  set.  Here  we  use  the  same  idea,  except  that  instead  of  the 
typical  set,  we  use  a  different  set  for  each  transaction,  and  if  there  is  no  sequence 
in  this  set  that  falls  into  the  received  bin,  this  means  not  that  we  cannot  decode 
the  sequence  but  rather  that  we  have  not  yet  received  enough  information  from 
the  node  and  must  ask  for  another  transaction.  The  set  associated  with  the  jth 
transaction  needs  to  have  the  property  that  its  size  is  less  than  2n(je+u\  the  num¬ 
ber  of  bins  into  which  the  source  space  has  been  split  after  j  messages,  so  that  it 
is  unlikely  for  two  elements  of  the  set  to  fall  into  the  same  bin.  Furthermore,  in 
order  to  ensure  that  we  eventually  decode  any  sequence  that  might  be  chosen  by 
the  node,  the  set  should  grow  after  each  transaction  and  eventually  contain  all  of 

X- 

Now  we  define  this  set.  First  let  S,;  =  {1,  •  •  •  ,i}  fl  U(QJ(/)),  the  nodes  up  to  i 
that  are  not  ignored  by  the  decoder,  and  let  y§.  be  the  source  sequences  decoded 
in  this  round  prior  to  phase  i.  The  set  associated  with  transaction  j  is 


TAvlJ  ±  {: y ?  :  Ht 


m.  «?) 


(YilY^Kje}. 


(3.18) 


To  be  specific,  after  j  transactions,  if  there  are  no  sequences  in  Tj(y matching 
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the  received  value  of  FijCj,  the  decoder  chooses  to  do  another  transaction  with 
node  i.  If  there  is  at  least  one  such  sequence,  call  it  y™,  choosing  between  several 
possibilities  arbitrarily. 

Observe  that 

Hence  Tj  satisfies  the  size  property  that  were  discussed  above.  Moreover,  it  grows 
with  j  to  eventually  become  Finally,  we  have  chosen  Tj  in  particular  because 
it  has  the  property  that  when  a  sequence  y"  falls  into  Tj  for  the  first  time,  the 
rate  at  which  node  i  has  transmitted  to  the  decoder  is  close  to  the  entropy  of  the 
type  of  yf .  This  means  that  we  can  relate  the  accuracy  of  the  decoded  sequences 
to  the  achieved  rate,  which  will  allow  us  to  prove  that  the  coding  scheme  achieves 
the  claimed  rate. 

4)  Round  Conclusion:  At  the  end  of  round  /,  the  decoder  produces  TJ (/  +  1) 
by  setting 

®(/  +  l)  =  {§£»(/)  :«(„„„)£  U  ay,}  (3.19) 

^  r’eR(  S)  J 

for  y  to  be  defined  such  that  y  >  e  and  y  — >  0  as  e  — >  0.  As  we  will  show,  it  is 
essentially  impossible  for  the  traitors  to  transmit  messages  such  that  the  type  of  the 
decoded  messages  does  not  fall  into  Q^r,  meaning  that  TC  is  always  in  TT(J).  This 
ensures  that  the  true  honest  nodes  are  never  ignored  and  their  source  sequences 
are  always  decoded  correctly. 
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3.5.4  Error  Probability 


Define  the  following  error  events: 

£iCM)  =  07W  Y?m 
£2(/)  =  {ttg9J(/)}, 

£3</)  =  «hw„(/))  ^  • 


The  total  probability  of  error  is 

(  N 

p»  =  Pr  ULbid’) 

\l=  1  i€!H 

As  we  have  said  but  not  yet  proved,  ‘K  will  usually  be  in  2J(/)  (i.e.  £2(J)  does  not 
occur),  so  we  do  not  lose  much  by  writing 

(N  r 

Pe<Pr  |J  £2(/+1)u(J  £!(/,*) 


a=i  L 


Let 


•A,4£c(/  +  i)nf'|£ft/,i) 

ien 

for  /  =  1,  •  •  •  ,  N,  so 

N 

1  -  Pe  >  Pr(«4ij  •  •  •  ,Av)  =  ]^[Pr(A/|Ai,  •  •  •  ,A/_i). 

i=i 

Observe  that  Ai  depends  only  on  Yj^(I)  and  Y^(/),  both  of  which  are  independent 
of  all  events  before  round  /  given  that  dt  G  QJ (/)  (i.e.  occurs),  since  this 

is  enough  to  ensure  that  Ytn(I)  is  non-null.  Since  A\,  ■  ■  ■  ,A/_i  includes  £2(/), 
we  can  drop  all  conditioning  terms  expect  it.  Note  also  that  £2(1)  occurs  with 
probability  1.  Therefore 

n 

f-pe>npr(^(/)) 

i=i 

n  n 

=  ID  "  >  1  -  E  Pr(^|£2«) 

1=1  1=1 
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SO 


N 


Pe<Y,  Pr  [W  +  1)  U  U  £!(/,<)  £§(/)). 


7=1 


ieic 


By  (3.19),  if  J-C  is  in  QJ(J)  but  not  in  QJ(J  +  1),  then  ^(^u(!U(7))(^))  ^  r-  Thus 


£2(/  +  1)  n  ££(/)  C  £3(/)  n  ££(/) 


so 


Af 


^<5>r  (fi3(J)U  £§(J) 

/=i  iew 

N 

<EPi 


7=1 


AT 


IV 


7=1 


ien 


+  5>(U  £!(/,*)  £S(/) 

7=1  iel 

<^Pr(£3(/)|f|£S(7i 

ieM 
N 

+  EEPr(£'(/7)|£37)) 


(3.20) 

7=1  ien 

where  we  have  dropped  the  conditioning  on  £2(/)  in  the  first  term  because  it 
influences  the  probability  of  8-3(1)  only  in  that  it  ensures  that  Y™  for  i  e  K  are 
non-null,  which  is  already  implied  by  PlieJt  ^i(^,  *)■ 


We  first  bound  the  first  term  in  (3.20)  by  showing  that  for  all  I, 

Pr  (e3{J)|  n  ef(J,o)  <^r  (3.21) 

ie'K 

If  the  traitors  receive  perfect  source  information,  then  as  we  have  already  noted  in 
(3.13),  Qk,t  only  puts  a  constraint  on  the  marginal  of  distributions,  and  the 
same  is  true  of  Q'^r-  In  particular,  t(Yu^^(I))  €  Q^r  is  equivalent  to  Yfi(I) 
being  typical.  Conditioning  on  f]ie^8l(I,i)  implies  that  Yfi(I)  =  Y£(I),  so 

Pr  (fi3(/)|  p|  £!(J,0)  <  PrK(l)  6  T7(Yx)) 

i&i 
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meaning  (3.21)  holds  for  sufficiently  large  n  by  the  AEP.  Thus  (3.21)  is  only  non¬ 
trivial  if  the  traitors  receive  imperfect  source  information.  This  case  is  dealt  with 
in  Section  3.5.6. 


We  now  consider  the  second  term  of  (3.20),  involving  Pr^^J, i)\8,2(I))  for 
honest  i.  Conditioning  on  £§(/)  ensures  that  i  G  U(QJ(/))  for  honest  i,  so  Y™(I) 
will  be  non-null.  The  only  remaining  type  of  is  a  decoding  error.  This  occurs  if 
for  some  transaction  j,  there  is  an  sequence  in  Tj(Y, different  from  Y™  that 
matches  all  thus  far  received  messages.  That  is,  if 


3 3,y?  G  TjiYZJMY?}  :  Fw(yf)  =  F^Y?). 


However,  §,_!  may  contain  traitors.  Indeed,  it  may  be  made  entirely  of  traitors. 
Thus,  we  have  to  take  into  account  that  P§)_1  may  be  chosen  to  ensure  the  ex¬ 
istence  of  such  an  erroneous  y'jn.  The  node’s  use  of  randomizing  among  the  C 
subcodebooks  is  the  method  by  which  this  is  mitigated,  as  we  will  now  prove. 


Let 


=  |{c  :  3 \j,y?  €  :  Fw(yf)  =  Fw(yf)}|. 


That  is,  k\  is  the  number  of  subcodebooks  that  if  chosen  could  cause  a  decoding 
error  at  some  transaction.  Recall  that  node  i  chooses  the  subcodebook  randomly 
from  the  uniform  distribution.  Thus,  given  yf  and  y§  ,  the  probability  of  an  error 
resulting  from  a  bad  choice  of  subcodebook  is  ki(yf,yg.  )/C.  Furthermore,  kx  is 
based  strictly  on  the  codebook,  so  we  can  think  of  it  as  a  random  variable  defined 
on  the  same  probability  space  as  that  governing  the  random  codebook  creation. 
Averaging  over  all  possible  codebooks, 


PrlfV/,;)!^/))  <  E]T  pferL  max 

Si-! 


c 
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where  the  expectation  is  taken  over  codebooks. 


Let  C  be  the  set  of  all  codebooks.  We  define  a  subset  Ci,  then  show  that  the 
probability  of  error  can  be  easily  bounded  for  any  codebook  in  C\Ci,  and  that  the 
probability  of  a  codebook  being  chosen  in  Ci  is  small.  In  particular,  let  Ci  be  the 
set  of  codebooks  for  which,  for  any  y f  e  y?  and  y§.  G  ,  ki(yf ,  y§ ._  )  >  B , 
for  an  integer  B  <  C  to  be  defined  later.  Then 


Pr(£,(/,i)|£§(/))  <  Pr(e\e,)  V  p(<,“)  max 

f  n'}  C-M'1 


B 


Vi^i 

+  Pr(e1)  p(y« 

V?0i? 

<  §  +  Pr(Ci). 


^-x^-1  C' 

c 

max  — 


(3.22) 


Recall  that  k\  is  the  number  of  subcodebooks  that  could  cause  an  error.  Since 
each  subcodebook  is  generated  identically,  k\  is  a  binomial  random  variable  with 
C  trails  and  probability  of  success  P,  where  P  is  the  probability  that  one  particular 
subcodebooks  causes  an  error.  Thus 


P  =  Pr  (3j,y'"e7’(Ss"  J\{!/"}: 

Fi^M)  =  Fi,cM)) 


Ji 

<L  V  Pr  (FcM)  = 

i=1  )\{yj*} 


<  Ji 


TAvlJ 


2~n{je+u) 


<  Ji(n  +  <  2n(e_l/) 


for  sufficiently  large  n.  For  a  binomial  random  variable  Y  with  mean  Y  and  any 
k,  we  can  use  the  Chernoff  bound  to  write 


Pr(y  >  k)  < 


(3.23) 
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Therefore 


Pr (kMii/U  >B)< 


eCP 
B  +  1 


B+l 


2 nB(e—is) 


if  v  >  e  and  n  is  sufficiently  large.  Thus 


Pr(Ci)  =  Pt(3»r.«L,  :  kM,vU  >  B) 

<  Y.  E  Pr(*iW.®_.)  >  B ) 


<  2nB(e~u) 
y?  j/s^j 

_  2n[1°gli,*l+1°s|ysi_1l+S(e-^)] 


(3.24) 


Combining  (3.20)  with  (3.21),  (3.22),  and  (3.24)  gives 

N 


pe  <  “  _|_  ^  _)_  2'n'[loe  |yil+Ios  l^si_i  |+s(e— 


<  _p  _/\T£  /  fi  _|_  2n[logl^Ml+B(e-^)] 

2  V  C 


B 


which  is  less  than  a  for  sufficiently  large  n  if 

log  |  | 


B  > 


v  —  e 


and 


c>  3 NLB  >  3A(XlogJ^3yt| 


a 


a(v  —  e) 


3.5.5  Code  Rate 

The  discussion  above  placed  a  lower  bound  on  C.  However,  for  sufficiently  large 
n,  we  can  make  ^  log  C  <  e,  meaning  it  takes  no  more  than  e  rate  to  transmit 
the  subcodebook  index  c  at  the  beginning  of  the  phase.  Therefore  the  rate  for 
phase  %  is  at  most  ( j  +  l)e  +  u,  where  j  is  the  number  of  transactions  in  phase 
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i.  Transaction  j  must  be  the  earliest  one  with  y”  G  Tj(y otherwise  it  would 
have  been  decoded  earlier.  Thus  j  is  the  smallest  integer  for  which 

meaning 

je  <  Ht^n  i«)(hi|hsi_1)  +  e.  (3.25) 

By  (3.19),  for  all  S  E  9J(J  +  1),  ^(s/u(ar(r)))  G  Ur'e:R(S)  %r'.  meaning 

«(Su«j)))e  f]  (J  =  QW  +  !))• 

Se9J(/+l)  r'eft(S) 

Furthermore,  from  (3.21)  we  know  that  with  probability  at  least  1  — a,  t(pu(W(i)))  £ 
Q^r.  Therefore 

t(mw)))  e  Q\r  n  Q "(iu(i  + 1)).  (3.26) 

Combining  (3.25)  with  (3.26)  gives  that  with  high  probability,  the  rate  for  all  of 
round  I  is  at  most 

Y  +  2e  +  v 

ieU(V3(i)) 

-  Ht(.yumi)))  iYum)  +  L(2e  +  v) 

<  sup  Hq  (hu(oj))  +  L{ 2e  +  u) 

gea^  rnQ*i(9J(7+l)) 

<  sup  Hq  (yU(9J(7+l))) 

gGQSc^riQi  (9J(7+1)) 

+  sup  Hq(Yu^(i))\u^3(i+i)))  +  L(2e  +  v) 
g 

<  sup  Hq(YUiK)) 

2JC.Q,  9eSjc.rn6T'(®) 

+  log  |’yu(*o(7))\u(9J(/+i))  |  +  L(2e  +  v).  (3.27) 

Whenever  'U(QJ(/))\U(QJ(/  +  1))  ^  0,  at  least  one  node  is  eliminated.  Therefore 
the  second  term  in  (3.27)  will  be  nonzero  in  all  but  at  most  L  rounds.  Moreover, 
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although  we  have  needed  to  bound  v  from  below,  we  can  still  choose  it  such  that 
v  — >  0  as  e  — >  0.  Thus  if  N  is  large  enough,  the  rate  averaged  over  all  rounds  is  no 
more  than 

Re(J{,r)=  sup  Hq{YU(  oj))  +  e 

aTcfl,  qeQh  rnQri(V3) 

where  e  — >  0  as  e  — >  0.  This  is  a  precisely  a-achicvable  rate  function.  By  continuity 
of  entropy, 


lim  Re[rK,  r)  =  sup  Hq{Yu^))  =  r) 

e_i>0  2305,  geQMiT.nQ(Q7) 

so  i?*(TC,  r)  is  achievable. 


3.5.6  Imperfect  Traitor  Information 

We  now  consider  the  case  that  the  traitors  have  access  to  imperfect  information 
about  the  sources.  The  additional  required  piece  of  analysis  is  to  prove  (3.21). 
That  is 

Pr (t(VnZn)  i  Q^r\Vn  =  Vn)  <  ^  (3.28) 

where  we  dehne  for  notational  convenience  V  =  Tk(J)  and  Z  = 

Observe  that  we  can  drop  the  hat  from  Vn  if  we  wish  because  of  the  conditioning 
term. 

To  help  explain  the  task  in  proving  (3.28),  we  present  a  similar  argument  to 
the  one  we  used  in  Section  3.3.3  to  interpret  Theorem  7:  we  impose  a  constraint 
on  the  traitors,  then  demonstrate  that  (3.28)  would  be  easy  to  prove  under  this 
constraint.  Suppose  that,  given  the  traitors  apply  a  function  h  :  W"  — >  Zn 
to  get  the  sequence  Zn  =  h(Wn),  then  report  this  Zn  as  the  truth.  Assuming  the 
decoder  successfully  decodes  Zn  so  that  Zn  =  Zn,  Vn  and  Zn  would  be  distributed 
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according  to 


qn(vnzn )  = 


Y[p(vT)r(w7 


T=  1 


1  {zn  =  h(wn)}. 


By  Lemma  8,  the  only  V,  Z  types  t  that  could  be  generated  from  this  distribution 
with  substantial  probability  are  those  for  which  t  is  close  to  qiyz).  Furthermore, 
we  can  write 


q{vz)  =  p(v)  r(w\v)q(z\w) 

W 

for  some  q{z\w).  Thus  q[vz)  G  Qjc:r  by  (3.7),  so  t  £  Q^r  for  some  small  q.  This 
would  prove  (3.28). 


However,  we  cannot  place  any  such  limitations  on  the  traitors’  behavior.  Our 
goal  will  be  to  show  that  for  any  action,  there  exists  a  function  h  such  that  the 
behavior  just  described  produces  nearly  the  same  effect.  Observe  that  a  trans¬ 
mission  made  by  the  traitors  is  equivalent  to  a  bin,  or  subset,  of  Zn.  That  is, 
all  sequences  that  would  produce  this  transmission  if  the  nodes  were  honest.  The 
decoder  will  choose  an  element  of  this  bin  as  Zn,  making  its  decision  by  selecting 
one  that  agrees  with  Vn  (specifically,  by  always  taking  elements  in  Tj).  Because 
the  traitors  do  not  know  Vn  exactly,  they  must  select  their  transmitted  bin  so  that 
for  every  likely  vn,  the  bin  contains  some  sequence  agreeing  with  it.  That  is,  each 
element  of  the  bin  agrees  with  a  certain  set  of  vns,  and  the  union  of  all  these  sets 
must  contain  all  likely  values  of  vn  given  Wn.  We  will  show  that  the  distribution 
of  the  sizes  of  these  “agreement  sets”  is  highly  non-uniform.  That  is,  even  though 
no  single  element  of  the  bin  agrees  with  all  likely  vn,  a  small  number  of  elements  of 
the  bin  agree  with  many  more  vns  that  the  others.  Therefore,  transmitting  this  bin 
is  not  much  different  from  choosing  one  of  these  “special”  elements  and  reporting 
it  as  the  truth. 


The  manner  in  which  the  traitors  choose  a  bin  based  on  Wn  is  complicated 
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by  two  factors.  First,  they  must  choose  a  subcodebook  index  c  to  use  for  each 
traitor  in  U(QJ(/))  before  transmitting  any  information.  Second,  the  exact  rate  at 
which  each  traitor  transmits  depends  on  the  number  of  small  messages  that  it  takes 
for  the  decoder  to  construct  a  source  estimate,  which  the  traitors  will  not  always 
know  a  priori.  Let  j  =  {jj}jea'nu(su(/))  be  the  vector  representing  the  number  of 
transactions  (small  messages)  that  take  place  with  each  traitor  in  U(Q3(/)).  There 
are  Jj  =  n^mtiow))  Jt  different  possible  values  of  j.  For  a  given  j,  each  set  of 
messages  sent  with  this  number  of  transactions  is  represented  by  a  bin.  Let  23j  be 
the  set  of  these  bins.  Note  that  we  include  all  choices  of  subcodebook  indices  in  this 
set;  there  are  many  different  binnings  for  a  given  j,  any  of  which  the  traitors  may 
select.  Now  the  traitors’  behavior  is  completely  described  by  a  group  of  potentially 
random  functions  g-}  :  Wn  — >  23  j  for  all  j.  That  is,  if  the  traitors  receive  Wn,  and 
the  numbers  of  transactions  are  given  by  j,  then  their  transmitted  bin  is  g-3(Wn). 
Note  that  when  we  refer  to  a  bin,  we  mean  not  the  index  of  the  bin  but  the  actual 
set  of  sequences  in  that  bin.  Thus  g-3(Wn)  is  a  subset  of  Zn. 

Consider  a  joint  v,  z  type  t.  We  are  interested  in  the  circumstances  under  which 
( VnZn )  has  type  t.  Recall  that  in  a  given  phase,  the  value  of  j  determines  what 
source  sequences  can  be  decoded  without  receiving  additional  messages  from  the 
node.  In  particular,  only  those  sequences  in  Tj  can  be  decoded.  Thus,  in  order 
to  decode  Zn  such  that  (VnZn)  has  type  t,  j  must  be  such  that  in  every  phase, 
sequences  of  the  proper  type  fall  into  T]t .  Specifically,  by  (3.18),  we  need  for  every 

i, 

h,(y,  in,.1)<M 

Hence 

^  jte  >  Ht(Z\V). 
ieTnu(<u(i)) 

Let  R( j)  be  the  total  rate  transmitted  by  all  the  traitors  in  'U(QJ(/))  given  j.  The 
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transmitted  rate  by  node  i  is  jte  +  v,  so 

R(i)=  \jie  +  v\>Ht(Z\V)  +  v. 

ieOTl  iU(9J(J)) 

Therefore  if  (VnZn)  G  A ^(VZ),  then  there  exists  a  j  such  that  i?(j)  >  Ht(Z\V)  +  u 
and  g-3(Wn )  D  A”(Z|Vn )  is  not  empty.  Let  5  = 

<JtJ  =  Pr((WWn)  G  Ten(l/fT),^(fTn)  n  Ant(Z\Vn)  ±  0) 


and 

tP  =  { t  :  max  &  ;  >  - - 1  . 

\  yR(j)>Ht(z\v)+u  J  (n  +  1)IVxZUjJ 

We  will  show  that  IP  C  Q^r,  so  that 


Pr(f(W^)  £  G  9J(/)) 

<  Pr(f(ynZn)  £  CP|fK  G  9J(/)) 

<  Pr((WWn)  $  T?(VW)) 

+  Pr  {{VnWn)  G  T?(VW),  ( VnZn )  G  At(VZ)) 

t£3>c 

<  5  +  Pr((P’Wn)  G  r£n(f/W),  3j  : 

teVc 


R(. j)  >  #t(z|v)  + 1/, a(iO  n  A?(z|v»)  +  0) 


<<5+(n+l)|Vxz|jT 


5 

(n  +  l)lVxZUT 


25  = 


a 

2N 


for  sufficiently  large  n. 


Fix  t  G  CP.  We  show  that  t  G  Q^r-  There  is  some  j  with 

R(j)  >  Ht(Z\V)  +  is  (3.29) 

and  St ,j  >  (n+1/vxzu  ■  Any  random  g-3  is  a  probabilistic  combination  of  a  number 
of  deterministic  functions,  so  if  this  lower  bound  on  5tj  holds  for  a  random  g3,  it 
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must  also  hold  for  some  deterministic  g-}.  Therefore  we  do  not  lose  generality  to 
assume  from  now  on  that  g-}  is  deterministic.  We  also  drop  the  j  subscript  for 
convenience. 


5 


2(n  +  l)lVxZl  Jj  J  ’ 


Define  the  following  sets: 

Ane(V\wn)  =  {un  G  Ten( V\wn)  :  g(wn)  D  ^\Z\vn)  ±  0}, 

A*{W)  =  \wn  G  T?(W)  :  Pr(Dn  G  Ane(V\wn)\Wn  =  wn)  > 

Applying  the  definitions  of  7  and  5t j  gives 

5 

(n  +  l^Uj 

<  Pr ((VnWn)  G  T™(yW)  :  g(Wn )  D  t%{Z\Vn)  ±  0) 

=  ^  p(wn)  Pr(Vn  G  Aneiy\wn)\Wn  =  wn) 

wnG  T?(W) 

<Pr  {W^A^W))  +  wtA^j- 

meaning  Pr (Wn  G  A™(W))  >  0l-n+1^VxZ\  •  Fix  w11  G  A™(W).  Since  A™(V\wn)  C 

Ten(V \wn), 

5 


\A"(v\m")\  >  2(n  +  f)|VxZ|Jj2”(H|t''H,|~')'  <3'30> 


Note  also  that 


\Ant(V\wn)\  <  V  |j(to")  n  A“(Z|w")| 

vneT?(V \wn) 

=  E  |A?(v|*”)nir(VK)|. 

znGg(wn) 


(3.31) 


Let  k2(zn,wn )  =  |A”(D|^n)  D  Ten(l/|wn)|.  This  value  is  the  size  of  the  “agreement 
set”  as  described  above.  Applying  (3.30)  and  (3.31)  gives 


E  wao*)  > 

zn£g(wn) 


5 


2(n  +  l)lVxZl  Jj 

>  2n(.ff(v|w)-2e) 


2n(if(V|W)-e) 


(3.32) 
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for  sufficiently  large  n.  We  will  show  that  there  is  actually  a  single  zn  G  g(wn )  such 
that  k2(zn,wn )  represents  a  large  portion  of  the  above  sum,  so  zn  itself  is  almost 
as  good  as  the  entire  bin.  Then  setting  h{wn )  =  zn  will  give  us  the  properties  we 
need.  Note  that 


Y,  IVW)! 

znezn  vneT"(v\wn) 

<  2niH(V\W)+Ht{Z\V)+e) 


(3.33) 


Moreover 

k2(zn,wn )  <  \T™(v\wn)\  <  2 nmvm+t) 
so  if  for  all  zn  we  let  l(zn)  be  the  integer  such  that 


2n(H(V\W)-l(zn)e)  <  <  2n(H(V| W)-(l(zn)-l)e) 


(3.34) 


then  l(zn )  >  0  for  all  zn .  Furthermore,  if  k2(zn,wn)  >  0,  then  l(zn )  <  L  = 
^H(\\w)- y  M(l)  =  \{zn  G  Zn  :  l(zn)  —  l}\.  Then  from  (3.33),  for  some  /, 

2n(H(V\W)+Ht(Z\V)+e)  >  ^  k2(zn ,  VJn) 

znezn 

>  k2(zn,wn ) 

zn£Zn:l(zn)=l 

>  M(l)2nWv 


giving 


MU)  <  2«(^*(^T)+(z+i)e)_ 


(3.35) 


For  any  bin  b  G  “Bj,  let  M(l,  b)  =  \{zn  G  b  :  l(zn )  =  /}|.  Observe  that  when  the  bin 
b  was  created,  it  was  one  of  2nRd)  bins  into  which  all  sequences  in  Zn  were  placed. 
Thus  the  probability  that  any  one  sequence  was  placed  in  b  was  2~nR^\  Hence 
M (/,  b)  is  a  binomial  random  variable  with  M (/)  trials  and  probability  of  success 
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2  nihb.  Hence  by  (3.29)  and  (3.35), 

EM (l,  b )  <  M{1) 2~nm 

<  2^(Ht(Z\V)+(l+l)e)2~n{Ht(Z\V)+u) 

__  2n((l+l)e—v) 

We  want  to  disregard  all  codebooks  for  which  M(l,  b )  is  much  larger  than  its 
expectation.  In  particular,  let  C2  be  the  set  of  codebooks  such  that  for  any  group 
of  nodes,  subcodebooks,  type  t,  transactions  j,  sequence  wn  G  Wn,  bin  b  and  integer 
/,  either  M(l,  b)  >  2ne  if  (/  +  l)e  —  v  <  0  or  M (/,  b )  >  2n^l+2^~u^  if  (/  +  l)e  —  v  >  0. 
We  will  show  that  the  probability  of  C2  is  small,  so  we  may  disregard  it.  Again 
using  (3.23),  if  (l  +  l)e  —  u  <  0, 

2ne 

Pr(%^)>2l<[^)]  <2-^ 

and  if  (l  +  l)e  —  v  >  0, 

r  p  -I  2™((i  +  2)e-^) 

Pr (M(l,b)  >  2n«l+2)e~u))  <  [— 

^  2^2"«i+2)e-i/) 


We  assume  from  now  on  that  the  codebook  is  not  in  C2,  meaning  in  particular 
that  M(l,  g(wn))  <  2ne  for  (/  +  l)e  —  v  <  0  and  M(l,  g(wn))  <  2n^l+2')€~u'>  for 
(l  +  l)e  —  v  >  0.  Applying  these  and  (3.34)  to  (3.32)  and  letting  /  be  an  integer 


152 


169 


defined  later, 

2-n2e  <  2-nH(V\W)  k2(zn ,  Wn) 

zn&g{wn) 

L 

<^M{l,g(wn))  2~"('-1)e 

1=0 

=  M(l,g(wn)) 2"n(/-1)e 

0  <l<l 

+  M{l,g{wn))  2~n^e 

—  —  e 

+  J2  M(l,g(wn))2~n^e 

|-1<  1<L 

<  M(l,  g(wn))2ne  +  ^  2ne2-n([-1)e 

o<z<r  i 

|  ^  ^  2n((^+2)e— n(^  — l)e 

<  ^  M(l,  g(wn))2nt  +  L2n(-f+2)e  +  L2n^e~v) . 

0<l<l 

Therefore 

M(l,g(wn ))  >  2"n3e  (l  -  L2n(-[+4)e  -  L2n(-be~v)^j  . 

0<l<l 

Setting  l  =  5  and  u  >  he  ensures  that  the  right  hand  side  is  positive  for  suffi¬ 
ciently  large  n,  so  there  is  at  least  one  zn  G  g(wn )  with  |Ten(V|wn)  n  A™(V|2:n)|  > 
2n(H(v|w)-4e)_  now  we  define  h  :  Wn  — >  2,"  such  that  h{wn)  is  such  a  for 
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w11  G  A™(W)  and  h(wn )  is  arbitrary  for  wn  ^  A™(W).  If  we  let  Zn  =  h(Wn),  then 


Pr ((VnZn)  G  A t(VZ)) 

>  p{wn)¥r(Vn  e  A™(V\h{wn))\Wn  =  wn) 

w"£i"(lV) 

>  E  j>k) 

i un£A™(W) 


■  Pr(Vn  G  T?{V\wn)  n  Ant(V\h(wn))\Wn  =  wn ) 
>  Pr(bPn  G  An(W)) 2~AH(,v\w)+e)2n(H<y\w)-/ie) 

r 

_ _ _ 9_n5e 

“  2(n  +  l)lVxZl 

The  variables  (VnWreZn)  are  distributed  according  to 


qn(vnwnzn )  = 


ri'1 

T— 1 


l{*n  =  h(wn)}. 


Let  qT(ywz )  be  the  marginal  distribution  of  qn(vnwnzn)  at  time  r.  It  factors  as 

qT(vwz )  =  p(u)r(u'|v)gr(z|w). 

Let  g(nz)  =  ^  gvC^)  and  ^  Then 

q(vz)  =  p(n)  r(w\v)q(z\w) 

W 

so  by  Lemma  8, 


D  t 


p(v)  r(w\v)q(z\w)  \  <  ——log  ^ 


T  5e. 


2(n  +  l)lVxZl 

Therefore  t  G  Q^r  for  sufficiently  large  n  and  some  rj  such  that  rj  — *  0  as  e  — *  0. 


3.5.7  Eavesdropping  Traitors 

We  consider  now  the  case  that  the  traitors  are  able  to  overhear  communication 
between  the  honest  nodes  and  the  decoder.  If  the  traitors  have  perfect  information, 
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then  hearing  the  messages  sent  by  honest  nodes  will  not  give  them  any  additional 
information,  so  the  above  coding  scheme  still  works  identically.  If  the  traitors 
have  imperfect  information,  we  need  to  slightly  modify  the  coding  scheme,  but  the 
achievable  rates  are  the  same. 

The  important  observation  is  that  eavesdropping  traitors  only  have  access  to 
messages  sent  in  the  past.  Thus,  by  permuting  the  order  in  which  nodes  are  polled 
in  each  round,  the  effect  of  the  eavesdropping  can  be  eliminated.  In  a  given  round, 
let  Di'  be  the  set  of  honest  nodes  that  transmit  before  any  traitor.  Since  the 
additional  information  gain  from  eavesdropping  will  be  no  more  than  the  values  of 
Yfi„  the  rate  for  this  round,  if  no  nodes  are  eliminated  (i.e.  U(QJ(/+1))  =  l((QJ(/))), 
will  be  no  more  than  the  rate  without  eavesdropping  when  the  traitors  have  access 
to  W,n  =  (Wn,Y^,).  The  goal  of  permuting  the  transmission  order  is  to  find  an 
ordering  in  which  all  the  traitors  transmit  before  any  of  the  honest  nodes,  since 
then  the  achieved  rate,  if  no  nodes  are  eliminated,  will  be  the  same  as  with  no 
eavesdropping.  If  you  are  reading  this,  email  me  with  the  magic  word  porcupine, 
and  I  will  send  you  twenty  dollars.  It  is  possible  to  determine  when  such  an  order 
occurs  because  it  will  be  the  order  that  produces  the  smallest  rate. 

More  specifically,  we  will  alter  the  transmission  order  from  round  to  round  in 
the  following  way.  We  always  choose  an  ordering  such  that  for  some  S  e  03,  the 
nodes  Sc  transmit  before  S.  We  cycle  through  all  such  orderings  until  for  each 
S,  there  has  been  one  round  with  a  corresponding  ordering  in  which  no  nodes 
were  eliminated.  We  then  choose  one  §  that  never  produced  a  rate  larger  than 
the  smallest  rate  encountered  so  far.  We  perform  rounds  in  a  order  corresponding 
to  S  from  then  on.  If  the  rate  ever  changes  and  is  no  longer  the  minimum  rate 
encountered  so  far,  we  choose  a  different  minimizing  S.  The  minimum  rate  will 
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always  be  no  greater  than  the  achievable  rate  without  eavesdropping,  so  after 
enough  rounds,  we  achieve  the  same  average  rate. 


3.6  Fixed-Rate  Coding 

Consider  an  L-tuple  of  rates  (Ri,  ■  ■  ■  ,  Rl),  encoding  functions  fi  :  V"  —>■ 
{1,  •  •  •  ,  2nRi}  for  i  £  M,  and  decoding  function 

L 

g  :  1^(1,  '  ’  ,  2ni?i}  —>  x  ■  ■  ■  x  ^2- 

i=  1 

Let  J;  £  { 1,  •  •  •  ,  2nRi}  be  the  message  transmitted  by  node  %.  If  node  i  is  honest, 
Ij  =  fi(Yf).  If  it  is  a  traitor,  it  may  choose  R  arbitrarily,  based  on  Wn.  Define 
the  probability  of  error  Pe  =  Pr  (Yfi  ^  where  =  g(I1,  ■  ■  ■  ,IL). 

We  say  an  L-tuple  (i?i,  •  •  •  ,  Rl)  is  deterministic-fixed-rate  achievable  if  for  any 
e  >  0  and  sufficiently  large  n,  there  exist  coding  functions  fi  and  g  such  that, 
for  any  choice  of  actions  by  the  traitors,  Pe  <  e.  Let  IRdfr  C  ML  be  the  set  of 
deterministic-fixed-rate  achievable  L-tuples. 

For  randomized  fixed-rate  coding,  the  encoding  functions  become 

/«  :  VI*  x  2,  — >■  {1,  •  •  •  ,  2nRi} 

where  2,  is  the  alphabet  for  the  randomness.  If  node  i  is  honest,  Ij  =  fi(Yfl,pi), 
where  p.i  G  Z  is  the  randomness  produced  at  node  i.  Define  an  L-tuple  to  be 
randomized-fixed-rate  achievable  in  the  same  way  as  above,  and  !Rrfr  C  ML  to  be 
the  set  of  randomized-fixed-rate  achievable  rate  vectors. 

For  any  S  C  M,  let  SW(lg)  be  the  Slepian-Wolf  rate  region  on  the  random 
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variables  Y§.  That  is, 


Let 


SW(1§)  4 


(r§  :  VS'  C  S  : 


J2Ri>H(Ys,\Y§\§') 

ie  S' 


lkr*fr  ^  {(R1}  ■  ■  •  ,  Rl)  :  VS  G  Sj  :  R§  G  SW(Fs)}, 
^dfr  —  {(-Rl)  ■  ■  •  )  -Rl)  £  3^rfr  :  ^c>l>  S2  €  Sj  l 


if  3r  e  R(S2)  :  Hr(YSin§2\W)  =  0, 


then  R§1n§2  e  SW(ySlns2)} 


The  following  theorem  gives  the  rate  regions  explicitly. 


Theorem  8  The  fixed-rate  achievable  regions  are  given  by 

Tldfr  —  -R*dfr  and  lRrfr  =  lk*jr- 


3.7  Proof  of  Theorem  8 

3.7.1  Converse  for  Randomized  Coding 

Assume  (/?!,•••  ,  RL)  is  randomized-fixed-rate  achievable.  Fix  S  G  h-  Suppose 
Sc  are  the  traitors  and  perform  a  black  hole  attack.  Thus  ygn  must  be  based 
entirely  on  {fi{Y?)}i&,  and  since  Pr(y§  Y§)  can  be  made  arbitrarily  small,  by 
the  converse  of  the  Slepian-Wolf  theorem,  which  holds  even  if  the  encoders  may 
use  randomness,  R§  G  SW(1§). 
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3.7.2  Converse  for  Deterministic  Coding 

Assume  (i?i,  •  •  •  ,Rl)  is  deterministic-fixed-rate  achievable.  The  converse  for  ran¬ 
domized  coding  holds  equally  well  here,  so  ,Rl)  G  lR*fr.  We  prove  by 

contradiction  that  (R\,  •  •  •  ,  R,l)  G  3^fr  as  well.  Suppose  (i?i,  •  •  •  ,  Ri)  G  fR*fr\fRjfr, 
meaning  that  for  some  Si,  S2  G  -f),  there  exists  r  G  i?(S2)  such  that  Hr{Y§in§2\W)  = 
0  but  R§ ins2  ^  SW(y§ins2).  Consider  the  case  that  IK  =  Si  and  r  is  such  that 
Hr(§ i  fi  CK| W)  =  0.  Thus  the  traitors  always  have  access  to  Y§[n;K. 

For  all  S  G  Sj,  let  D(Y§ )  be  the  subset  of  Ten(l§)  such  that  all  sequences  in 
D  are  decoded  correctly  if  Sc  are  the  traitors  and  no  matter  what  messages  they 
send.  Thus  the  probability  that  Y£  G  D(Y§ )  is  large.  Let  D(YSl nW)  be  the  marginal 
intersection  of  D(Y§1)  and  D(Y%).  That  is,  it  is  the  set  of  sequences  y§1  n:K  such  that 
there  exists  and  y£\Sl  with  (y£inJty£lVK)  G  D{YSl)  and  (y^y^s,)  G  D{Y%). 
Note  that  with  high  probability  Y<™rm  G  D{Y§inji).  Suppose  Y£in3<  G  D(l§irm)  and 
(ys"nMyM\«i)  G  D(Yx),  so  by  the  definition  of  D,  Y^  =  Y. Since  R§1  £ 

SW(ySinM),  there  is  some  7/§]n;K  G  D(Ygin:K)  mapping  to  the  same  codewords  as 
Win*  such  that  y§[n^  ^  Y^irai.  Because  the  traitors  have  access  to  Isirw,  they  can 
construct  2/g”n:K,  and  also  find  y'g^  such  that  (ys"nw2/§”\jf)  e  -D(fsi)-  If  the  traitors 
report  i/gh^,  then  we  have  a  contradiction,  since  this  situation  is  identical  to  that 
of  the  traitors  being  §£,  in  which  case,  by  the  definition  of  D ,  Y §"n;K  =  r/g"n;K. 

3.7.3  Achievability  for  Deterministic  Coding 

Fix  (i?i,  •  •  •  ,  R,j  )  G  fR*lfr.  Our  achievability  scheme  will  be  a  simple  extension  of  the 
random  binning  proof  of  the  Slepian-Wolf  theorem  given  in  [41].  Each  encoding 
function  ft  :  — >■  {1,  •  •  •  ,  is  constructed  by  means  of  a  random  binning 
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procedure.  Decoding  is  then  performed  as  follows.  For  each  S  G  F),  if  there  is  at 
least  one  y§  G  T"(Yg)  matching  all  received  codewords  from  S,  let  y”s  be  one  such 
sequence  for  all  i  E  s.  If  there  is  no  such  sequence,  leave  y”§  null.  Note  that  we 
produce  a  separate  estimate  y”§  of  Y ]n  for  all  §  9  i.  Let  y"-  equal  one  non- null  y^§. 


We  now  consider  the  probability  of  error.  With  high  probability,  y”^  =  for 
honest  i.  Thus  all  we  need  to  show  is  that  for  all  other  S  E  Sj  with  i  E  S,  y^ §  is  null 
or  also  equal  to  Y-1 .  Fix  §  E  Sj.  If  there  is  some  r  E  R(S)  with  Hr(YKn§\W )  =  0, 
then  by  the  definition  of  lR*lfr,  R%n§  G  SW(Fxns)-  Thus  with  high  probability  the 
only  sequence  y^n§  G  T™(Y<xnS)  matching  all  received  codewords  will  be  Y^nS,  so 
y"s  =  Y™  for  all  i  E  J~C  fl  S. 


Now  consider  the  case  that  Hr(Ywn&\W)  >  0  for  all  r  E  R(§).  For  convenience, 
let  V  =  Yx ns  and  Z  =  Yj.  Let  Rv  =  Y,i^nsRi  and  Rz  =  Y^i&Ri-  Since 
R§  E  SW(y§),  Ry  +  Rz  >  H{VZ)  +  rj  for  some  rj.  Let  bv{yn)  be  the  set  of 
sequences  in  V"  that  map  to  the  same  codewords  as  un,  and  let  6^  C  Zn  be  the 
set  of  sequences  mapping  to  the  codewords  sent  by  the  traitors.  Then  V  may  be 
decoded  incorrectly  only  if  there  is  some  v'n  E  bv(Vn)  and  some  zn  E  bz  such  that 
v'n  Vn  and  ( v'nzn )  G  T™(VZ).  For  some  wn  E  Wn, 


where 


Pr(3u,n  G  bv(Vn)\{Vn},zn  E  bz  : 

(i v'nzn )  ET?(VZ)\Wn  =  wn) 

<  Pr(Dn  ^  T™(y\wn)\Wn  —  wn)  +  p(vn\wn) 

vn€T?(V\wn ) 

•  l{3u,n  G  bv(vn)\{vn},  zn  Ebz:  ( v,nzn )  E  Ten(VZ)} 

<e  +  2 -ntrnym-t)  ^  fa(zn,wn)  (3.36) 

znebznT?(z ) 


h(zn,  wn)  =  |{un  G  T£n( V\wn)  :  3v'n  E  bv(vn)  D  T?(V\zn)\{vn}}\. 
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On  average,  the  number  of  typical  vn  put  into  a  bin  is  at  most  2n^H^~Ry+e\  so  we 
can  use  (3.23)  to  assume  with  high  probability  than  no  more  than  2nfHW)-Rv+2e) 
are  put  into  any  bin.  Note  that 

V  h(z«,w«)<  E  nr”(Kk”)\{«”} 

zneTp(Z)  zneTp(Z) vn€T?(V\wn) 

=  E  E  W(zW")\ 

vneT?(V\wn)  v'n£bv(vn)r\T™(V\zn)\{vn} 

<  2n{H(V\W)+e)2n(H(V)-Rv+2e)2n(H(Z\V)+e) 

2n(H(y  z)+H(y\w)-Rv+^) 


The  average  k 3  sum  over  typical  zn  in  a  given  bin  is  thus 

2n(H{VZ)+H{V\W)-Rv-Rz+4:e)  <  2n{H{V\W)+Ae-ri) 


We  can  use  an  argument  similar  to  that  in  Section  3.5.6,  partitioning  T™(Z)  into 
different  l  values,  to  show  that  with  high  probability,  since  H(V\W)  >  0,  for  all 
bins  bz, 


k3(zn,wn) 

zneT?(z)r\bz 

Applying  this  to  (3.36)  gives 


<  2n(R(V\W)+5 e-rj) 


Pr(3u,n  G  bv(yn)\{vn },  zn  ebz:  ( v'nzn )  e  T?{VZ)\Wn  =  wn)  <  e  +  2n(6e-J?). 


Letting  77  >  6e  ensures  that  the  probability  of  error  is  always  small  no  matter  what 
bin  bz  the  traitors  choose. 


3.7.4  Achievability  for  Randomized  Coding 

We  perform  essentially  the  same  coding  procedure  as  with  deterministic  coding, 
expect  we  also  apply  randomness  in  a  similar  fashion  as  with  variable-rate  coding. 
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The  only  difference  from  the  deterministic  coding  scheme  is  that  each  node  has  a 
set  of  C  identically  created  subcodebooks,  from  which  it  randomly  chooses  one, 
then  sends  the  chosen  subcodebook  index  along  with  the  codeword.  Decoding  is 
the  same  as  for  deterministic  coding.  An  argument  similar  to  that  in  Section  3.5.4 
can  be  used  to  show  small  probability  of  error. 
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CHAPTER  4 

THE  CEO  PROBLEM 


4.1  Introduction 

In  this  chapter,  we  study  the  CEO  Problem  under  adversarial  attack.  The  CEO 
Problem  is  a  special  case  of  multiterminal  source  coding  shown  in  Fig.  4.1.  A 
source  sequence  Xn  is  generated  i.i.d.  in  time  from  a  distribution  p(x).  The  de¬ 
coder  is  interested  in  recovering  Xn ,  but  no  nodes  can  observe  it  directly.  Instead, 
node  i  for  i  —  1, . . . ,  L  observes  Y ]n,  a  corrupted  form  of  Xn.  Node  i  then  encodes 
is  observation  at  rate  R*  to  the  decoder,  which  produces  an  estimate  Xn,  which  it 
attempts  to  make  close  to  Xn  subject  to  some  distortion  constraint.  The  source 
sequences  ( Xn ,  ljn, . . . ,  Y£)  are  i.i.d.  in  time  and  correlated  in  space.  The  distri¬ 
bution  of  these  variables  is  structured  so  that  the  Y-n  are  conditionally  independent 
given  Xn.  This  conditional  independence  requirement  is  the  characteristic  prop¬ 
erty  of  the  CEO  Problem,  and  appears  to  make  the  problem  simpler  to  solve.  At 
a  given  time  t  G  {1, . . . ,  n},  we  assume  that  the  sources  X(t),  Y\  (t), . , .. ,  YL(t)  are 
distributed  according  to 

L 

p(xyi  ■  ■  -yL)  =  p(x)  Y[p{yi\x).  (4.1) 

i=  1 

For  the  adversarial  problem,  we  assume  that  the  adversary  controls  any  s  of  the 
L  nodes.  We  adopt  the  “deterministic  fixed-rate”  model,  in  the  terms  of  Chapter  3, 
and  we  assume  the  adversary  has  complete  access  to  all  sources.  This  model  is 
as  pessimistic  as  possible,  but  to  ensure  robust  performance  we  err  on  the  side  of 
giving  traitors  more  power  rather  than  less. 

Unlike  the  Slcpian-Wolf  problem,  the  CEO  Problem  has  the  advantage  that 
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Figure  4.1:  The  CEO  Problem.  The  sources  sequences  Y™  are  each  corrupted 
versions  of  Xn.  The  former  sequences  are  observed  by  nodes  1  to  L  and  encoded 
versions  are  transmitted  to  the  decoder,  which  attempts  to  recover  Xn. 

no  node  has  a  monopoly  on  any  knowledge  about  the  target  source  X.  Therefore 
there  is  no  need  to  redefine  the  notion  of  achievability  from  the  usual  definition  for 
non-adversarial  problems.  That  is,  a  guarantee  on  a  certain  level  of  distortion  at 
the  decoder  for  a  certain  set  of  rates  from  the  nodes  is  a  true  guarantee,  without 
any  qualifications  due  to  the  presence  of  the  adversary. 

The  ultimate  goal  is  to  characterize  the  rate-distortion  region,  which  consists 
of  all  vectors  (Ri, . . . ,  Rl ,  D)  for  which  there  exists  a  code  scheme  that  achieves 
average  distortion  D  between  the  true  source  Xn  and  the  estimate  Xn,  given  the 
data  rate  Ri  from  node  i  to  the  decoder  for  i  —  1, . . . ,  L.  In  Sec.  4.3,  we  provide 
a  coding  scheme  that  is  a  generalization  of  the  Berger-Tung  scheme  [45,  46].  This 
scheme  yields  an  inner  bound  on  the  rate-distortion  that  applies  to  problems  even 
more  general  than  the  CEO  Problem.  However,  since  we  cannot  prove  that  it  is 
tight  in  general  (indeed,  the  general  CEO  problem  even  without  an  adversary  is 
open),  we  focus  on  two  more  specific  regimes,  in  which  we  have  somewhat  better 
success. 
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First,  we  study  the  CEO  problem  with  discrete  sources,  in  which  sources  ob¬ 
served  by  nodes  have  the  same  conditional  distribution  for  each  node,  and  in  the 
regime  with  many  nodes  and  high  rates.  It  was  shown  in  [48]  for  the  non-adversarial 
problem  that  with  many  nodes,  the  distortion  falls  exponentially  with  the  sum- 
rate,  and  they  characterize  the  rate  of  exponential  decay.  In  Sec.  4.4,  we  use  the 
inner  bound  found  in  Sec.  4.3  to  find  a  lower  bound  on  the  exponential  decay  rate 
with  adversaries.  In  Sec.  4.5,  we  provide  an  upper  bound  on  this  decay  rate. 

The  second  regime  in  which  we  study  the  problem  in  more  detail  is  the  quadratic 
Gaussian  version.  Here,  all  sources  are  Gaussian  and  the  target  distortion  func¬ 
tion  is  quadratic.  Without  adversaries,  the  complete  rate-distortion  region  was 
characterized  in  [55]  and  [56].  In  Sec.  4.6,  we  use  the  inner  bound  from  Sec.  4.3 
to  find  an  inner  bound  on  the  quadratic  Gaussian  problem  with  adversaries.  In 
Sec.  4.7,  we  derive  an  outer  bound  on  the  rate- region  of  the  quadratic  Gaussian 
problem  with  adversaries.  Furthermore,  along  the  lines  of  the  asymptotic  results 
for  discrete  sources  originally  proved  in  [48]  and  extended  to  our  results  in  Sec.  4.4 
and  4.5,  we  derive  some  asymptotic  results  for  the  quadratic  Gaussian  problem.  It 
was  originally  shown  in  [53]  that  for  many  nodes  the  minimum  achievable  distor¬ 
tion  fell  like  K/R  for  sum-rate  R.  The  exact  value  for  the  constant  K  was  found 
in  [54],  In  Sec.  4.8,  we  use  our  previously  derived  bounds  in  Sec.  4.6  and  Sec.  4.7 
to  state  and  prove  bounds  on  the  proportionality  constant  K  for  the  adversary 
problem. 
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4.2  Problem  Description 

Given  block  length  n  and  rates  Ri  for  %  —  1, . . . ,  L,  the  encoding  function  for  agent 
i  is  given  by 

fi  2nRi}.  (4.2) 

The  decoding  function  at  the  decoder  is  given  by 

L 

<P  Y[{ l,...,2nRi}  ^Xn  (4.3) 

i=  1 

where  X  is  the  alphabet  of  the  estimate  of  X,  which  may  differ  from  X.  Denote  by 
Ci  the  codeword  from  the  set  {1, ...  ,  2ni?i}  sent  by  node  i  to  the  decoder.  Honest 
node  choose  their  transmissions  by  setting  Ct  =  /,(h)n).  If  i  is  a  traitor,  then  it  may 
select  Ci  in  any  manner  it  chooses,  including  using  information  about  the  honest 
coding  strategy  or  the  true  values  of  the  sources.  Finally,  the  decoder  produces  its 
estimate  of  Xn  by  setting  Xn  =  4>(Ci, . . ,  ,Cl). 

The  distortion  function  is  given  by 

d  :  X  x  X  R.  (4.4) 

This  function  measures  the  quality  of  the  estimate  Xn  produced  at  the  source;  our 
goal  will  be  to  minimize  the  expected  value  for  a  given  set  of  rates.  For  a  given  set 
of  source  values  (xn,y™, . . . ,  y£),  we  define  the  maximum  possible  distortion  over 
all  possible  actions  of  the  traitors  to  be 

1  n 

D(xn,  ...,yl)  =  rcmax  max  -  ^  d(x(t),x(t)).  (4.5) 

\T\=s  t=1 

In  this  expression  T  runs  over  all  possible  sets  of  traitors.  We  also  maximize 
over  Ct,  the  codewords  sent  by  the  traitors,  ensuring  that  any  potentially  traitor 
actions  are  considered.  Observe  that  even  the  choice  of  which  agents  to  capture 
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may  be  a  function  of  the  source  values.  Note  also  that  in  (4.5)  xn  is  a  function  of 
CL  given  by  0,  and  C-h  is  in  turn  a  function  of  yvH  given  by  the  /) . 

We  say  the  rate-distortion  vector  (i?i, . . . ,  Rl,  D)  is  achievable  if  for  sufficiently 
large  n  and  any  e  >  0  there  exists  encoding  and  decoding  functions  j\ ,  •  •  • ,  Jl  and 
(f)  as  specified  in  (4.2)  and  (4.3)  such  that 

E  [D{Xn,  Y;\  . . . ,  Y£)]  <  D  +  e.  (4.6) 

Let  D(Ri, . . . ,  Rl)  be  the  minimum  achievable  distortion  for  rates  R\, . . . ,  Rl- 

4.2.1  Error  Exponent  for  Discrete  Sources 

We  now  describe  the  error  exponent  problem  for  discrete  sources.  Assume  that  the 
distribution  of  1)  given  X  is  uniform  for  all  i.  That  is,  the  distribution  p(yi\x)  does 
not  depend  on  i.  We  may  therefore  specify  the  problem  in  terms  of  a  distribution 
p(x,y).  We  assume  a  Hamming  distortion  given  by 

{0  if  x  —  x 

(4.7) 

1  if  x  ^  x 

For  a  fixed  number  of  nodes  L,  sum-rate  R,  and  s  traitors,  let  the  minimum 
achievable  distortion  be 

D(R,L,s)=  inf  D(R1:..,,RL).  (4.8) 

R1,...,Rl:R1+-+Rl<R 

Let  the  minimum  achievable  distortion  at  sum-rate  R ,  for  any  number  of  nodes, 
and  with  the  fraction  of  traitors  no  more  than  (3,  be 

D(/3,  R)  =  inf  D(R,L,s).  (4.9) 

L,s:s<pL 


166 


183 


Observe  that  we  assume  that  as  the  number  of  nodes  L  grows,  the  fraction  of 
traitors  s/L  remains  fixed  at  (3  €  [0,1].  Our  goal  is  to  see  how  the  fraction  /3 
of  traitors  affects  achievable  rates.  Finally,  our  quantity  of  interest  is  the  error 
exponent  given  by 

m  =  to  (4,10) 

R—t  oo  ri 

A  lower  bound  on  the  error  exponent  is  stated  and  proved  in  Sec.  4.4,  and  an  upper 
bound  in  Sec.  4.5. 

4.2.2  The  Quadratic  Gaussian  Problem 

In  the  quadratic  Gaussian  version  of  the  problem,  X  is  a  Gaussian  random  variable 
with  zero  mean  and  variance  a\.  The  sources  observed  by  the  nodes  are  given  by 

Yi  =  X  +  Ni  for  i  =  1, . . . ,  L  (4.11) 

where  Nt  is  a  Gaussian  random  variable  with  zero  mean  and  variance  .  The 
distortion  function  is  quadratic,  given  by 

d(x,  x)  =  (x  —  x)2.  (4.12) 

An  inner  bound  on  the  rate-distortion  region  for  this  problem  is  stated  and  proved 
in  Sec.  4.6,  and  an  outer  bound  in  Sec.  4.7. 

In  addition,  we  characterize  the  asymptotic  behavior  of  the  distortion  as  a 
function  of  the  sum-rate  for  many  nodes.  In  particular,  the  minimum  achievable 
distortion  for  sum-rate  R  falls  like  Ka\/  R,  and  we  are  interested  in  K  as  a  function 
of  /3,  again  the  fraction  of  traitors  s/L,  which  is  kept  fixed  for  large  L.  For  formally, 
let  D(R,  L)  be  the  minimum  achievable  distortion  for  L  agents  where  the  sum- rate 
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is  at  most  R.  In  the  case  that  all  agents  have  the  same  quality  of  observation  (i.e. 
(jpf  =  a2N  for  all  i),  let  D(R )  =  lini^oo  D(R,  L).  Finally  dehne 

K(^)  =  lim  R (4.13) 

R-+  oo  (T  y 

That  is,  D(R)  goes  like  Ka2x/R  for  large  R.  Bounds  on  K(f3)  are  stated  and 
proved  in  Sec.  4.8. 


4.3  Achievability  Scheme  for  Adversarial  Attacks 

We  give  an  inner  bound  on  the  rate-distortion  region  for  a  somewhat  broader  class 
of  problems  than  the  CEO  Problem  as  described  in  Sec.  4.2.  We  keep  the  basic 
format  of  the  problem,  in  that  the  nodes  observe  Yj  for  i  =  1, . . . ,  L  and  the  decoder 
is  interested  in  recovering  X  subject  to  some  distortion  constraint,  but  we  relax 
the  condition  that  the  Y*  need  by  conditionally  independent  given  X.  Instead,  we 
allow  any  distribution  among  these  L  +  1  variables  given  by 

p{xy1---yL).  (4.14) 

The  following  theorem  gives  an  inner  bound  on  the  rate-distortion  region  for  this 
problem. 

Theorem  9  Let  U%  for  i  —  1, . . . ,  L  be  random  variables  with  alphabets  'll;  respec¬ 
tively,  jointly  distributed  with  X,  Yi, . . . ,  Yf  such  that  the  following  Markov  chain 
constraints  are  satisfied: 

Ui-Yi-  (X,Yi, . . . ,  Yi_i,  Yi+1, . . . ,  Yl, 

Cl, ... ,  Ui- i,  Ui+ 1,  •  •  • ,  UL)  for  i  =  1, . . . ,  L.  (4.15) 
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We  may  write  the  distribution  of  these  random  variables  as 
Pr(A"  =  x,Y1  =  y1,...,  YL  =  yL,  Ux  =  Ui, . . . ,  UL  =  uL) 

L 

=  p(x,yi---yL)Y[Q(yui\yi)  (4.16) 

i—  1 

where  Q{ui\yi)  completely  specifies  the  variable  Ui.  The  tuple  (R\, . . . ,  Rl,  D)  is 
achievable  if  there  exist  {14}  such  that: 

•  For  all  S  C  {1, ... ,  L }  with  IS)  =  L  —  2s  and  all  A  C  S, 

^^>/(yA;C/A|C/5\A).  (4.17) 

ieA 

•  For  all  distributions  q(uL),  there  exists  a  function 

L 

fq-HUi^X  (4.18) 

i=  1 

such  that  the  following  property  holds  for  all  pairs  of  sets  S  E  {1, L}  with 
\S\=L-s  and  conditional  distributions  r(usc\x,us):  Let 

r(x,  uL)  =  y ^p(x,ys)  TT  r(uSc\x,us).  (4.19) 

.  ys  i&S 

If  q[uL)  =  r{uL),  then 

D  <Er  [d(X,  fq(Ui, . . . ,  UL))\  (4.20) 

where  the  expectation  is  take  over  the  distribution  r(x,uL )  defined  in  (4.19). 

We  offer  the  following  intuition  for  this  result.  Node  i  sends  to  the  decoder 
a  degraded — or  quantized — version  of  its  measurement  represented  by  Ut.  If  all 
nodes  were  honest,  the  joint  distribution  of  (. X ,  UL )  would  be  given  by 

L 

^2p(x)Y[p(yi\x)Q(ui\yi).  (4.21) 

yL  i=1 


169 


186 


However,  due  to  the  presence  of  the  traitors,  the  joint  distribution  of  (A",  UL )  that 
actually  occurs,  which  is  represented  by  r(x,uL),  may  not  match  the  distribution 
that  would  result  with  no  traitors.  Since  the  decoder  can  observe  only  UL,  it  can 
only  recover  q(uL),  from  which  it  must  choose  the  estimation  function  fq.  From 

q,  the  decoder  can  identify  sets  of  nodes  S  that  may  be  the  set  of  honest  agent  as 
the  ones  satisfying  (4.19)  for  some  r.  However,  there  may  be  several  possible  sets 
that  are  indistinguishable  to  the  decoder,  and  for  each  set  many  possibilities  for 

r,  each  one  representing  a  particular  choice  of  action  by  the  traitors.  The  decoder 
must  construct  its  estimate  by  choosing  a  function  fq  that  satisfies  the  distortion 
constraint  for  each  of  these  possibilities,  as  (4.20)  stipulates. 

Fig.  4.2  shows  the  structure  of  the  achievability  strategy.  The  overall  con¬ 
figuration  is  the  same  as  the  standard  non- adversarial  Berger-Tung  strategy,  in 
that  Slepian-Wolf  coding  is  used  to  relay  quantized  versions  of  the  sources  to  the 
destination,  after  which  the  destination  estimates  X  from  its  recovered  data.  How¬ 
ever,  several  of  the  blocks  need  to  be  changed  from  the  non- adversarial  strategy. 
In  fact,  for  this  problem  the  Slepian-Wolf  blocks  are  almost  exactly  analogous  to 
the  strategies  to  defeat  adversarial  attack  on  the  Slepian-Wolf  problem  studied  in 
Chapter  3. 

The  following  subsections  give  the  proof  of  Theorem  9. 

4.3.1  Coding  Strategy 

Descriptions  of  the  codebook,  and  the  encoding  and  decoding  rules  follow.  We 
assume  the  existence  of  random  variables  Ui  for  i  =  1 , ,L  and  functions  fq  for 
all  distribution  q(uL )  satisfying  the  conditions  of  Theorem  9. 
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Decoder 


Figure  4.2:  Diagram  of  achievable  strategy  for  CEO-type  problems.  The  strategy, 
described  in  detail  in  Sec.  4.3.1,  differs  from  the  standard  Berger- Tung  strategy 
mostly  in  the  two  blocks  in  the  decoder.  The  Slepian-Wolf  decoding  block  needs 
to  be  aware  of  the  possibility  of  adversarial  manipulations  in  recovering  the  £/*, 
and  the  estimation  function  fq  used  in  the  final  block  depends  on  the  empirical 
distribution  of  the  recovered  Ut. 

1 )  Random  Code  Structure:  Each  node  i  forms  its  codebook  in  the  following 
way.  It  generates  2n('r(y<;D)-H5)  n-length  codewords  at  random  from  the  marginal 
distribution  of  Ut.  Let  C-u')  be  the  codeword  set.  These  codewords  are  then  placed 
into  2nRi  bins  uniformly  at  random. 

2)  Encoding  Rule:  Upon  receiving  Y™,  node  %  selects  uniformly  at  random  an 
element  of 

qC)  n  Tin)^.  |yn)_ 

We  denote  this  selected  sequence  [/”.  Node  i  then  sends  to  the  decoder  the  index 
of  the  bin  containing  [/”. 


3)  Decoding  Rule:  For  each  S  C  {1, . . . ,  L}  with  (S'!  =  L  —  s,  the  decoder  looks 
for  a  group  of  codewords  in  Tc‘l\Us )  that  matches  the  received  bins  from  all  agents 
in  S.  If  there  is  exactly  one  such  a  sequence,  call  it  U™[S]  for  all  i  e  S.  If  there  is 
no  such  sequence  or  more  than  one,  define  this  to  be  null. 
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For  all  i,  if  there  is  exactly  one  non-null  value  of  U™[S]  among  all  S  3  i,  then 
call  this  sequence  {/".  If  the  values  of  U™[S]  are  all  null  or  they  are  inconsistent, 
then  set  t/"  arbitrarily. 

Let  t{uL)  be  the  type  of  UnL .  Let  TJ  be  the  collection  of  sets  S  for  which  Ug  is 
jointly  typical.  This  can  be  written  as 

II t(us)  -p(«s)||oo  <  pr  6  hf  |  for  all  S  e  91  (4.22) 

1  lies  I I 

Let  q{uL )  be  the  distribution  minimizing 

\\t(uL)  —  q,(nL)||00  (4.23) 


subject  to 


q(us)  =  p(ug )  for  all  S'  G  T J. 


(4.24) 


The  decoder  chooses  for  its  estimate  Xn  =  fq(uL),  using  the  function  corresponding 
to  this  distribution  q[uL). 


4.3.2  Error  Analysis 

Consider  the  following  error  events: 

1.  Node  %  can  find  no  conditionally  typical  codewords  given  the  sequence  Y,n. 
That  is,  the  set 

e(n)nTH([/.|yn)  (4.25) 

is  empty.  With  high  probability,  this  does  not  occur  by  the  standard  proof 
of  the  point-to-point  rate-distortion  theorem  [92], 
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2.  The  sequence  Ujj  is  not  jointly  typical,  where  H  is  the  true  set  of  honest 
agents.  That  this  occurs  with  low  probability  follows  from  the  fact  that  Y£ 
with  be  jointly  typical  with  high  probability,  and  the  Markov  Lemma  [46]. 

3.  There  is  a  jointly  typical  codeword  u'g  different  from  Ujj  but  with  u'j!  in  the 
same  bin  as  for  all  i  €  H .  It  is  shown  in  [93]  that  this  occurs  with  low 
probability  if  for  all  A  C  H, 

YR^I^JAYa\Uh\a).  (4.26) 

ieA 

This  follows  from  (4.17)  even  though  the  size  of  H  is  L  —  s  rather  than  L  —  2s, 
by  the  following  argument.  We  partition  A  as  A  =  Si  U  •  •  •  U  Sb  U  A',  where 
the  sets  Sb  satisfy  \Sb\  —  L  —  2s  for  b  —  1, . . . ,  B,  and  \A'\  <  L  —  2s.  Also  let 
S'  be  a  set  with  S'  =  L  —  2s  and  A'  C  5"  C  H .  We  may  write 

B 

=  +  <4-27) 

ieA  6=1  ieSb  ieA' 

B 

>  Y,  I(YSb-,  Usb)  +  I(Ya,-  UA.\US.\A.)  (4.28) 

6=1 

B 

=  iH(Usb)  -  H{Usb\Ysb )]  +  H{Ua>\US'\A')  -  H{UA'\Ya.US'\A' ) 
6=1 

(4.29) 

>  H(USl  ■  •  • USb\UH\a )  +  H{UA'\Uh\aUSi  ■  ■  -USb) 

B 

-  Y  H(USb\YSb )  -  H(Ua'\YA'US'\a')  (4.30) 

6=1 
B 

=  H(Ua\Uh\A)  -  Y  H(USb\YSb )  -  H(UA'\YA'Us,\a')  (4.31) 

6=1 

=  H(Ua\Uh\a)  -  H{UA\YAUH\A )  (4.32) 

=  I(YA-UA\UH\A)  (4.33) 

where  (4.28)  follows  from  several  applications  of  (4.17),  (4.30)  follows  because 
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conditioning  reduces  entropy,  (4.31)  follows  from  the  chain  rule,  and  (4.32) 
follows  because  Ua  —  Ya  —  Uh\a  is  a  Markov  chain. 

4.  For  some  S  ^  H  and  i  e  H  fl  S,  tj^[S ]  ^  U™.  This  can  only  occur  if  there 
is  a  jointly  typical  sequence  that  matches  the  bins  sent  by  nodes  in  H  fl  S 
other  than  the  true  value  of  U^nS.  Note  that  \H  fl  S\  >  L  —  2s,  so  by  (4.17) 
and  the  argument  in  (4.27)-(4.33),  for  all  A  C  H  fl  S,  we  have 

YA\UHnS\A).  (4.34) 

i&A 

Therefore,  again  by  the  argument  in  [93],  with  high  probability  the  only 
jointly  typical  sequence  in  the  bins  sent  from  nodes  in  H  fl  S  will  be  the  true 
value  of  UxnS,  so  this  error  event  does  not  occur. 

4.3.3  Distortion  Analysis 

We  have  shown  that  error  events  (1) — (4)  as  described  in  Sec.  4.3.2  occur  with  small 
probability.  Let  us  assume  they  do  not  occur.  Hence  for  all  i  e  H ,  U%  =  UJ?.  Since 
Ujj  is  jointly  typical,  H  G  23.  For  all  S'  e  TJ,  we  have  that  ||t(«s)  —  p(x<s)||oo  < 
j-j  6  u  .  Certainly  if  e  =  0,  then  this  implies  t(us)  =  p(us).  Moreover,  if  e  =  0  then 
the  solution  of  the  optimization  problem  in  (4.23) — (4.24)  would  yield  q{uL )  =  t{uL). 
By  continuity,  when  e  is  nonzero,  there  must  be  some  constant  C  for  which,  for 
sufficiently  small  e, 

\\q(uL)  -  t(ui)||oo  <  Ce.  (4.35) 

Moreover,  by  (4.24),  q{uu)  =  p(uh)- 

Let  t(x,uL)  be  the  joint  type  of  ( Xn ,  UnL).  The  average  distortion  is  given  by 
1  n 

~^2d(x(t),  fq(ih(t), . . .  ,uL(t)))  =  ^2t(x,uL)d(x,  fq(uu...,uL)).  (4.36) 
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Let 

r(x,uL)  =  q{uL)t(x\uL) .  (4.37) 

Because  q{uL )  and  t[uL)  are  close  as  given  by  (4.35),  we  can  write 

| \r(x,uL)  -  t(x,uL)||00  =  \\{q{uL)  -  t(ui))t(a:|ML)||00  <  || q{uL)  -  t(uL) <  Ce. 

(4.38) 

Therefore  the  average  distortion  is  upper  bounded  by 

y^(r(a;,  uL)  +  Ce)d(x,  fq(uu  . . . ,  uL))  (4.39) 

X,UL 

<y^r(x,uL)d(x,fq(ui,...,iiL))  +  Ce  ma xd(x,x)  (4.40) 

'  ^  X,X 

X  ,UL 

<  D  +  Ce  ma xd(x,x)  (4-41) 

X,X 

where  (4.41)  follows  from  (4.20),  which  we  may  apply  because  r(x,uL)  satisfies 
(4.19)  with  S  =  H,  since  r(us )  =  q{us)  =  p(us)  and  r(uL)  =  q[uL).  The  theorem 
follows  by  sending  e  — >  0. 


4.4  Inner  Bound  on  Error  Exponent  for  Discrete  Sources 

We  use  Theorem  9  to  prove  a  lower  bound  on  the  error  exponent  for  discrete 
sources.  Recall  that  in  this  problem  the  distribution  of  Yt  given  X  is  identical  for 
all  i.  We  therefore  describe  our  results  in  terms  of  the  distribution  of  X  and  one 
Yi,  given  by  p(x,y).  We  introduce  two  auxiliary  random  variables  U  and  J.  The 
variable  J  takes  values  in  d  and  is  independent  of  (A",  Y)  with  marginal  distribution 
Pj  (j ) ;  X  — >  (Y,  J)  — »  U  is  a  Markov  chain.  The  conditional  distribution  of  U  is 
given  by  Q(u\y,j),  and  we  define  for  convenience 

Q(u\x,j)  =  J ~2p(y\x)Q(u\y,j ).  (4.42) 

y 
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We  also  introduce  the  vector  7^  for  all  j  E  3-  Let 


F(Pj,Q,  7) 


^2'rjD{QXj\\Q{u\x1,j)) 

’ _ 1 _ 

I(Y;U\X,J) 


where 

Q\,j 


and  A  is  chosen  so  that 


Q 1  A(^ki,i)QA(«k2,  j) 

^Q1_a(u|xi,  j)Qa(u|x2,  j) 


(4.43) 


(4.44) 


'^2'yjD(Q\,j\\Q(u\x1,j))  =  ^2'yjD(Qxj\\Q(u\x2,j)).  (4.45) 

j  j 

It  was  shown  in  [48]  that  the  error  exponent  without  an  adversary  is  given  by 

E(0)  =  maxF(Pj,Q,Pj).  (4.46) 

The  following  theorem,  our  lower  bound,  recovers  this  quantity  as  a  lower  bound 
at  /3  —  0. 


Theorem  10  For  a  fraction  (3  of  traitors,  the  error  exponent  is  lower  bounded  by 

E(/3)  >  max  min  F{Pj,  Q,  7)  (4.47) 

Pj,Q  7 

where  we  impose  the  constraints  that 

7 j  >1  —  2/3  and  7 j  <  Pj(j)  for  all  j  G  3 ■  (4.48) 

i 

To  prove  Theorem  10,  we  follow  the  path  of  [48]  by  presenting  the  bound  in 
two  steps,  the  second  a  generalization  of  the  first.  In  Sec.  4.4.1  we  state  a  prove  a 
lemma  that  constitutes  our  loose  bound.  Then  in  Sec.  4.4.2,  we  tighten  this  bound 
to  complete  the  proof  of  Theorem  10. 
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4.4.1  Preliminary  Bound 


Lemma  9  Let  U  be  a  random  variable  such  that  X  —  Y  —  U  is  a  Markov  Chain 
and  the  distribution  of  U  is  given  by  Q(u\y).  Let 


Q(u\x)  =  Y p{y\x)Q{u\y ). 


The  error  exponent  is  lower  bounded  by 


E(/3 )  >  max 
Q 


where 


Q\{u)  = 


min(l  -  2/3)D(Qx\\Q(u\x1)) 

X\,X2 

WTu \x) 

Q(u\x1)1~xQ(u\x2)x 


Yq(u  i^i)1  xQ(u\x^y 

u 

and  X  is  such  that  D^QxWQiu^i))  =  D(Qx\\Q(u\x2))- 


(4.49) 


Proof:  We  prove  the  lemma  by  applying  Theorem  9.  To  do  so,  we  must  specify 
the  auxiliary  random  variables  Ui  as  well  as  the  function  fq  as  a  function  of  q[uL). 
For  each  i,  Ui  has  distribution  conditioned  on  Y%  given  by  Q{u\y).  We  construct  fq 
as  follows.  Given  q(uL),  select  any  set  S  with  l^l  =  L  —  s  and  q{us )  =  pius)-  Let 

fq(uL )  =  maxp(x\us).  (4.50) 

X 

Set  R{  =  I{Y ;  U\X)  +  e  for  all  i.  Note  that  the  sum-rate  is  given  by 

R  =  L  I(Y ;  U\X)  +  Le.  (4.51) 
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We  now  show  that  (4.17)  is  satisfied  for  sufficiently  large  L.  For  any  S  with 
| S\  =  L  —  2s  and  A  C  S,  we  may  write 


I(YA]  Ua\Us\a )  =  H(Ua\US\a )  ~  H(Ua\YaUs\a ) 

(4.52) 

<  H(Ua\US\aX )  +  H(X)  -  H (U a\Y aU s\a)) 

(4.53) 

=  H(Ua\X)  -  H{Ua\Ya )  +  H(X) 

(4.54) 

=  [H( Ui\X)  -  H(Ui\Y$  +  H(X) 

(4.55) 

ieA 

=  YJI(YiMX)  +  H{X) 

(4.56) 

ieA 

=  \A\I(Y-U\X)  +  H(X ) 

(4.57) 

=  ^i??:  +  /7(X)-|H|e 

(4.58) 

ieA 


where  (4.54)  follows  because  UA  —  X  —  Us\a  and  Ua  —  Ya  —  U$\a  are  Markov 
chains,  (4.55)  follows  because  Ut  does  not  depend  on  Uj  or  Yj  for  j  ^  i  after 
conditioning  on  Yt  or  X,  and  (4.57)  because  all  the  Ut  are  distributed  identically. 
Note  that  (4.58)  satisfies  (4.17)  if  |A|  >  H(X)/e.  If  |A|  <  H(X)/e,  then  S\A 
grows  with  L  because  s  =  j3L  so  l^l  =  (1  —  2 j3)L;  thus  the  conditioning  term 
causes  I(Ya',Ua\Us\a)  to  shrink,  and  (4.17)  is  sure  to  be  satisfied  for  sufficiently 
large  L. 

We  now  need  to  evaluate  the  right  hand  side  of  (4.20)  to  find  the  achieved 
distortion.  For  any  r(x,uL),  let 

r{x,  x,  x,  uL )  =  r(x,  uL)p(x\iiHns)p{x\us) ■  (4.59) 

The  variables  X  and  X  defined  in  this  distribution  are  defined  formally  and  have 
no  counterpart  in  the  operation  of  the  code.  However,  note  that  we  may  upper 
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bound  the  achieved  distortion  by 


D<Er[dH{X,fq{U1,...,UL))] 

(4.60) 

<E  r[dH(X,X)} 

(4.61) 

<E  r[dH{X,X)]  +E  r[dH(X,X)] 

(4.62) 

where  (4.61)  follows  because  the  true  function  fq  chooses  the  most  likely  value  of 
X  given  Us,  whereas  X  is  defined  to  be  a  randomly  chosen  value  according  to  the 
a  posterior  probability,  which  will  certainly  be  a  worse  estimate;  and  (4.62)  follows 
by  the  triangle  inequality.  We  proceed  to  evaluate  the  two  terms  in  (4.62).  The 
first  term  depends  only  on  the  distribution  of  X  and  X,  which  we  may  write 

r{x,x)  =  22 r(x,uL)p(x\uHnS )  =  22  P(x,uHns)p{x\uHns)  (4.63) 

uL  uhdS 

because  r(x,un)  =  p(x,uh)-  The  second  term  in  (4.62)  depends  only  on  the 
distribution  of  X  and  X,  which  we  may  write 

r(x,x)  =  22  r(uL)p(x\uHns)p(x\us)  (4.64) 

UL 

=  22p(Us)p(£ \uHns)p(x\us)  (4-65) 

US 

=  22  p&  us)p(x\uHns)  (4.66) 

US 

=  ^  p(x,uHns)p(x\uHns)  (4.67) 

UHDS 

where  we  have  used  the  fact  that  r(us)  =  p(us)-  Note  that  the  distribution  of 
(X,X)  is  identical  to  that  of  (X,X).  Hence  the  two  terms  of  (4.62)  are  the  same, 


179 


196 


(4.68) 

(4.69) 

(4.70) 

Let  7  =  \S  fl  H\/L.  Certainly  7  >  1  —  2/3.  Let  t  be  the  type  of  usnH ■  This  is 
a  type  in  space,  rather  than  in  time,  and  it  is  well  defined  because  the  alphabets 
for  Ui  is  the  same  for  each  i.  For  ay  G  X, 

p(xi)Q(usnH\xi)  p(xi)  2~xLiD(t\\Q(u\x1))+H(t)} 

p(usnH )  ~ =  E  p(x)  2~'yLlD(t\\Q(u\x))+H(.t)] 

X 

<  2-7L[D(4IQ(«hi))-min^  r>(t||Q(u|x))— 5] 


and  we  need  only  bound  one  of  them.  We  may  therefore  write 


D/2  <  Er  [dH(X,  X)}  =  Prr  (X  ±  X) 


^2  p(xi,uHnS)p(x2\uHns) 

Xl,X2£X:Xi^X2 

p(xi)Q(uHns\xi)p(x2)Q(uHns\x2 ) 


E 

x\,X2&X:x\^X2 


p(uHnS ) 


for  any  S  >  0  and  sufficiently  large  L.  Therefore 

p(x1)Q(uSnH\xi)p(x2)Q(usnH\x2) 


E 

usc\h&X/l  (U) 


Pr(«srm) 


<  2'yLH(t) 2-"/L{D{t\\Q(u\xi))-minx  D(t||Q(u|x))— <5] 

•  p(x2)  2-W[D(tll<3(«b2))+^(0] 

<  2~xL[D(t\\Q(u\xi))+D(t\\Q(u\x2))-rmnx  D(t\\Q(u\x))—S] 


for  sufficiently  large  L.  Therefore,  using  the  fact  that  the  number  of  types  t  is 
polynomial  in  L, 

D/2  >  min  min7[D(i||(3(w|xi))  +  D(t\\Q(u\x2))  —  min  D(t\\Q(u\x))  —  5] 

Xi,X2'-X±^X2  t  X 

=  minniin2  7D(i||(5(w|x))  —  <5  (4-71) 

t  X 
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where  min2  takes  the  second  smallest  value.  It  can  be  shown  that  this  term 
involving  the  second  smallest  value  of  x  is  the  minimum  Chernoff  Information. 
That  is, 

1(|-g D  >  min ^/D(Qx\\Q(u\xi))  -  5  - 

L  Xl,X2  L 

where  Q\  and  A  are  dehned  by  (4.49).  Recalling  that  7  >  1  —  2 (3  and  taking  the 
limit  as  h  — >  0  gives 

—  log  D  ~ 

lim  - - - >  min(l  -  2/3)D(Qx\\Q(u\xi)). 

L— >00  ]_/  XI, X2 

Applying  (4.51)  proves  Lemma  9. 


□ 


4.4.2  Tighter  Bound 

Now  we  improve  this  bound  by  introducing  the  additional  auxiliary  random  vari¬ 
able  J.  Following  the  essential  argument  of  [48] ,  we  alter  onr  application  of  Theo¬ 
rem  9  so  that  the  nodes  are  split  into  groups,  each  with  a  different  method  of  quan¬ 
tization.  Partition  {1,  •  •  •  ,  L}  into  disjoint  sets  Rj  such  that  \\R3\  —  Pj(j)L |  <  1 
for  all  j.  For  all  i  G  Rj ,  the  conditional  distribution  of  Ui  given  Yj  is  given  by 
Q(u\y,  J  =  j).  If  i  G  Rj,  we  set  Ri  =  /(F;  U\X,  J  =  j).  Checking  (4.17)  follows 
along  similar  lines  as  it  did  in  Sec.  4.4.1.  The  sum-rate  becomes 

R  =  Y,  \Rj\ I{Y\  U\X,  J  =  j)  <  L  I(Y ;  U\X,  J )  +  0(1).  (4.72) 

j 

The  definition  of  fq  is  the  same  as  in  Sec.  4.4.1,  accounting  for  the  different  distri¬ 
bution  of  the  underlying  variables.  Let  7 j  =  \Rj  fl  S  fl  H\/L.  Then 

5^7j  >  1  -  2/3  and  7 j  <  Pj(j )  Vj  G  d-  (4.73) 

j 
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Let  tj  be  the  type  of  UR:jnSnH-  Thus 

Q(uSnH \x)  =  Y[2~L^D^  H0(«h,i))+Hfe)]_  (4.74) 

j 

Applying  this  to  (4.70)  yields 

— y —  >  minmin2  'S^rjjD(tj\\Q(u\x,  j))  —  5 - 

/  j  t*  X  J_j 

iea 

>  mm'^2'yjD(Qxtj\\Q(u\x1,j))  -  5-  ^y^  (4.75) 

j 

where  Q\j  is  given  by  (4.44)  and  (4.45).  Extending  (4.75)  to  minimize  over  all 
7 j  satisfying  (4.73),  then  combining  the  result  with  (4.72)  completes  the  proof  of 
Theorem  10. 

4.5  Outer  Bound  on  Error  Exponent  for  Discrete  Sources 

Recall  the  definition  of  F(Pj,  Q,  7)  in  Sec.  4.4,  as  we  use  it  again  in  the  statement 
of  our  upper  bound  on  the  error  exponent,  which  is  stated  as  follows. 

Theorem  11  For  a  f3  fraction  of  traitors,  the  error  exponent  is  upper  bound  as 

E(j3)  <  min  max  F(Pj,  Q,  7)  (4.76) 

7  Pj,Q 

where  7  and  Pj  are  constrained  so  that 

7 j  >1  —  2/3  and  7 j  <  Pj(j)  for  all  j  G  3-  (4.77) 

i 

Note  that  the  upper  bound  in  Theorem  11  differs  from  the  lower  bound  in  The¬ 
orem  10  only  by  a  reordering  of  the  maximum  and  minimum.  Moreover,  the  two 
bounds  meet  at  [3  —  0  and  together  recover  the  result  of  [48],  giving  the  error 
exponent  with  no  adversary,  as  stated  in  (4.46).  The  proof  of  Theorem  11  follows. 
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Proof:  Recall  that  if  node  i  is  honest,  the  codeword  C\  transmitted  to  the 
decoder  is  given  by  ffYf1).  Define  a  distribution  on  Xn  and  CL  as 

L 

P(xn,cL)  =  =  fi{Vi)). 

yUL  i=  1 

We  will  refer  to  various  marginals  and  conditionals  of  this  distribution  as  well. 

Let  Xf  =  (X(l), . . . ,  X(t  —  1),  X(t  +  1), . . . ,  X(n)).  For  any  t  and  xt,  define 
Ui(t,xt )  to  be  a  random  variable  distributed  with  X{t)  and  Yt(t)  such  that 

Pr(A"(t)  =  x,Yi(t)  =  y,Ui(t,xt )  =  c)  =  p(x,  y)  Pr(Cj  =  c\Yi(t)  =  y,Xt  =  xt). 

Note  that  X(t)  —  Y{t)  —  Ui(t,xt )  is  a  Markov  chain. 

Suppose  the  adversary  performs  the  following  attack.  It  selects  a  set  S  C 
{1  with  l^l  =  (1  —  j3)L  and  \H  D  S\  =  (1  —  2 f3)L,  where  H  is  the  true 

set  of  honest  nodes;  i.e.  Hc  are  the  traitors.  The  set  S  is  the  traitors’  target  set, 
that  they  endeavor  to  fool  the  decoder  into  thinking  may  be  the  true  set  of  honest 
nodes.  They  generate  a  sequence  X'n  from  the  distribution  P(xn\cHns)-  Finally, 
they  construct  Cs\h  just  as  honest  nodes  would  if  X'n  were  the  truth.  That  is, 
from  X'n ,  they  generate  Cs\h  from  the  distribution  P(cs\h \%n),  and  transmit  this 
Cs\h  to  the  decoder. 

Observe  that  Xn,  X/n,CL  will  be  distributed  according  to 

This  distribution  is  symmetric  in  xn  and  x'n .  In  particular,  if  S  were  the  true  set  of 
honest  nodes,  and  the  traitors  performed  an  analogous  attack  selecting  the  set  H 
as  their  target  set,  then  precisely  the  same  distribution  among  Xn,X'n ,  CL  would 
result,  except  that  Xn  and  X'n  would  switch  roles.  Hence,  if  the  decoder  achieves 
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a  distortion  of  D:  that  is,  if  Xn  is  such  that  D  >  Xn,Xn),  then  it  must  also 
be  that  D  >  ^dH(X/n,  Xn),  because  the  decoder  can  only  generate  one  estimate, 
but  it  must  work  in  both  situations.  Therefore 


D  >  dH(Xn,Xn )  +  dH(X'n,Xn)} 

>  ^dH(Xn,X'n) 

2n 

1  n 

=  ^Epr(m/x'(t)) 

1  t=  i 


(4.78) 


1 

2  n 

1 

2  n 


E  E 

t—  1  x(t)^x' (t),CL 

n 

E  E 

*=1  x(t)^x’(t),CHnS 


P(x(t),CH)P(x'(t),CS ) 
-P(cffns) 

P(x(t),  Ctfn5)P(x'(£),  c^ns) 


-P(cffns) 


"d5T 


(4.79) 


where  we  used  the  triangle  inequality  in  (4.78).  The  expression  in  (4.79)  can  be 
shown  to  be  concave  in  P.  We  may  write 


P(x(t),cHns)=  p(x^  II  =  Mv?)) 

xt’VHnS  ieHnS 

=  ^2p(xn)  JJ  ^p(^(t)|x(t))^p(&it|xt)l(ci  =  fiiVi)) 

x(tc)  ieHnS  yi(t)  Vi,t 

=  Y^p(xn)  JJ  ^2p(y\x(t))Pr(Ci  =  a\Xt  =  xt,Yi(t)  =  y) 

xt  iGHnS  y 

=  E xtp(x(t))  JJ  ^p(y\x(t))  Pi (Ui(t,  Xt)  =  c^Y^t)  =  y ) 

ieHnS  y 

=  E xAxit))  J]  Pr(Ul(t,Xt)=ci\X(t)  =  x(t)).  (4.80) 

icHnS 

Dehne  for  convenience 


P(x,uHnS\t,Xt)  =p(x)  JJ  Pi(Ui(t,Xt)  =  Ui\X(t)  =  x). 

ieHnS 
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Substituting  (4.80)  and  (4.5)  into  (4.79)  and  using  concavity  gives 

D(t)  >  E  -  ^  -P(au,  uHns\t,  Xt)P(x 2,  uHns\t,  Xt ) 

Xl^X2  E  P(x3,uHns\t,Xt) 


UHnS 


xz 


>\x\  lExt  max  Y 

Xl^X2  L - 

uhdS 


p(x  1,  uHns\t,  Xt)P(x 2,  uHns\t ,  ^U) 

maxP(a;3,MHn5|b^i) 

£3 


Let 


=  <  uHns  ■  x  =  argma xp(x')  Q(ui(t,  Xt)\x') 


ieHnS 


Then 


D(t)>  \X\  ’Ey  max  Y  Y 

Xl^X2  -  - 


x3  V'HriS^'U'X 


>\x\  1]Ext  max  Y 

XiytX2,X3  * - ' 

UHC\S^x3 


P(x i,  uHnS\t,  Xt)P(x2,  uHns\t,  Xt) 
P(x3,uHnS\t,Xt) 

P(x1,uHnS\t,  Xt)P(x 2,  uHnS\t,  Xt) 


(4.81) 


P(x3,uHns\t,xt) 

For  fixed  x3,  if  both  X\  and  X2  are  different  from  x3,  we  can  always  increase  the 
value  in  (4.81)  by  making  X\  or  x2  equal  to  x3.  Hence,  we  need  only  consider  cases 
in  which  either  X\  =  x3  or  x2  —  x3.  Thus 


D{t)  >  \X\  ’Ey  max  Y  P(xi,uHnS\t,  Xt 

Xl¥=X2 

uHnS€U'X2 

=  |X|_1Ey(  max  p(xi)  Pr(UX2\xi,  Xt). 


Xl^X2 

Using  ideas  from  [48],  we  have  that 


Pr(UX2\Xl,Xt)  >  2-^nsD(Q^(Ui(t,Xt)\x1))-o(L) 


where 


Q\\u)  = 


Pr l-\Ui(t,Xt)  =  u\Xl)  Pr A(Ui(t,Xt)  =  u\x2) 


A« 


with  A't}  a  normalizing  constant  and  A  chosen  such  that 


(4.82) 


Y  D(Qx\\pi(Ui(t,xt)\xi))=  Y  D(Qx\WUi(t,Xt)\x2)).  (4.83) 


ieHnS 


ieHnS 
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Hence 


D{t)  >  E  Y  2~1Tiinxi’X2  ’^i<mnSD(Q\)W  Pr(Ui(t,Xt)\xi))-o(L) . 

Putting  (4.84)  back  into  (4.79)  gives 

i  n 

—  log  D  <  —  log  —  E  JjW  2~minxi’X2  EigwnS  D(Q\  ]  II  Pr(Ui(t,Xt)\xi))-o(L) 


t=  1 


1 


<  —  E  y  min 

n  t  X!,X2 

t= i  ieHnS 


X  ^(g?||Pr(K(t,xt)ki))  +  ^) 

eHnS 

where  we  have  used  Jensen’s  inequality  in  (4.85). 

A  chain  of  standard  inequalities  (see  [48])  yields 

L  i  n  L 

r  =  Yr^-H  %  X  I(~Yi (*); Xt\x(t)). 

2=1  t=  1  2=1 

Putting  (4.85)  together  with  (4.86)  and  using  the  fact  that 

Ei  A  ^  M_ 

Ei  Bi  -  mfX 

for  any  nonnegative  and  Bri  we  get 

min  V  D(gJ}  ||  Pr(£7i(2,  xt) |xi))  +  o(L) 

-logD^  X1,X2  ieHnS 

<  max - - - 


A 


t,xt 


XW);^Mt)|*(*)) 


2=1 


in 7  X]  ^(Qi°iig(«ii*i)) 


mm 

Xl,X2  L 


<  max 

Ui-.X-^Yi^Ui 


ieHnS 


+  e. 


YI{y^ \x) 


2=1 


(4.84) 


(4.85) 


(4.86) 


(4.87) 


Observing  that  the  choices  of  H  and  S  could  have  been  made  differently  by  the 
traitors,  we  introduce  a  vector  7*  for  i  —  1, . . . ,  L  under  the  constraints 


li 


G{0,z}  and 


7*  =  1  -  2/3. 


(4.88) 
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This  allows  us  to  tighten  (4.87)  to 

L 

— log  d  .  xi’xatr  , 

-  <  nnn  max - he.  (4.89) 

R  7 <  Ui-.X^Yi^Ui  i  L 


we  claim  that  the  value  of  (4.89)  does  not  change  if  we  replace  (4.88)  with 

7*  <  ^  and  J^y*  >  1  -  20.  (4.90) 


This  is  because  we  may  use  arbitrarily  large  L,  so  any  y*  satisfying  (4.88)  can 
be  closely  approximated  by  a  y,  satisfying  (4.90).  Furthermore,  we  introduce  a 
variable  /  with  values  in  {1, . . . ,  L}  such  that 


Pr(G  =  u\I  =  i,  Y  =  y)  =  Pr (Ut  =  u\Y  =  y ) 


and  maintaining  the  condition  y*  <  Pj(i)  for  all  i  =  1, . . . ,  L.  Doing  so  gives 

lin  y^  'yiD(QX  i\\Q(u\x1,i)) 

rro  < 


mm 

—  log  D  X1’X2 

-  <  min  max 

R  7.  Pi,Q 


I(Y;U\X,I) 


=  min  max  F(Pj,  Q,  y) . 
7i  Pl,Q 


Replacing  /  with  a  variable  J  over  an  arbitrary  alphabet  proves  (4.76).  Note  that 
in  this  process  (4.82),  (4.83),  and  (4.90)  have  become  (4.44),  (4.45),  and  (4.77) 
respectively.  □ 


4.6  Inner  Bound  on  Rate-Distortion  Region  for  the 
Quadratic  Gaussian  Problem 

With  no  adversary,  the  rate-distortion  region  for  the  quadratic  Gaussion  problem 
was  found  simultaneously  in  [55]  and  [56].  They  found  that  with  s  =  0,  the  tuple 
(R±, . . . ,  Rl ,  D)  is  achievable  if  and  only  if  there  exist  ?y  for  %  —  1, . . . ,  L  such  that 
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1.  for  all  A  C  {1, . . . ,  L }, 


£^>£ri  +  ilogi-ilog(T  +  £ 

ieA  ieA  \  A  ieAc 


1  —  exp  (—2  Ti 


a 


(4.91) 


Ni 


2.  the  distortion  D  is  bounded  by 


1/1  v-  1  -  exp(— 2r i 

n  -  a2  +  a2 

U  aX  i=1  aNi 


(4.92) 


The  following  theorem  gives  our  inner  bound  on  the  rate-distortion  region  with  an 
adversary. 

Theorem  12  The  tuple  (Ri, . . ,,  Rl,  D)  is  achievable  if  there  exist  r,  for  i  = 
1, . . . ,  L  and  for  each  matrix  E  G  Rixi  there  exist  constants  c;( E)  such  that 


1.  for  all  S  C  {1, ... ,  L }  with  \S\  =  L  —  2s  and  all  A  C  S, 

£  >  £  r,  +  t  log  (4r  +  ^  1  ~  e*P(-2ri) ) 

ieA  ieA  V  x  ieS  Ni  ) 


1 


1  —  exp  (—2  ^ 


x  ieS\A 


a 


(4.93) 


Ni 


2.  for  every  S  C  {1, . . . ,  L}  with  |JSf|  =  L  —  s  and  every  vector  A  G  ML  for  which 


E ij  —  ox  + 


a 


Ni 


1  —  exp(— 2r'j 


-Si  j  for  all  i,j  G  S 


(4.94) 


and  A i  =  of  for  i  G  H, 


D  >  ESjA  Ci{T)Ui 


(4.95) 


1=1 


where  by  ESjA  we  mean  an  expectation  taken  over  a  distribution  on  the  vari¬ 
ables  (. X ,  U\, . . . ,  Ul)  with  covariance  matrix 


1  a\  XT  ^ 

A  E 


(4.96) 
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Proof:  Again  we  apply  Theorem  9.  We  define  Ul  as 

Ui  =  Yi  +  Wi  (4.97) 

where  Wl  is  a  Gaussian  random  variable  with  zero  mean  and  variance  crj^..  The 
estimation  function  fq  is  determined  by  the  sample  covariance  matrix  of  q(uL), 
which  we  denote  E.  Then  let 

L 

fq{uL)  =  (4.98) 

i=  1 


Consider  first  the  rate  condition  in  the  statement  of  Theorem  12.  Define  (just 
for  the  section) 

r i  =  m-  Ui\X)  =  \  log  °k±?k. 

2  °Wi 

There  is  a  one-to-one  correspondence  between  r*  and  crjy  ,  so  we  can  write  every¬ 
thing  in  terms  of  r,;  instead  of  a^v, .  It  is  not  hard  to  show  that 


I(Ya]Ua\US\a)  =  + 2l0g  (^2 ~  +  Yl 

keA  V  ax 


1  —  exp(— 2rj 


ies 


cr 


Hence  (4.93)  follows  from  (4.17). 


-2!°g 


cr 


x 


E 

ieS\A 


1  —  exp(— 2rj 


a 


Nj 


Now  consider  the  distortion  condition  in  the  statement  of  Theorem  12.  Any 
distribution  r(x,uL)  has  a  covariance  matrix  which  we  parameterize  as  in  (4.96). 
The  condition  (4.94)  is  precisely  the  same  as  (4.19)  in  that  the  marginal  distribution 
of  Us  is  exactly  the  honest  distribution.  Therefore  (4.95)  follows  from  (4.20).  □ 


189 


206 


4.7  Outer  Bound  on  Rate-Distortion  Region  for  the 
Quadratic  Gaussian  Problem 

The  following  theorem  gives  our  outer  bound  on  the  rate-distortion  region  for  the 
quadratic  Gaussian  CEO  Problem  with  an  adversary. 


Theorem  13  If  the  tuple  (Ri, . . . ,  Rl,  D)  is  achievable,  then  there  exist  r*  for 
Ti  =  1, . . . ,  L  such  that  for  all  S  C  {1, ... ,  L}  with  \S\  =  L  —  2s  and  all  A  C  S, 


v-  ,  ,  x  -,  1  ,  1  1 , 

}^Ri-  2^ri+  2log15  ~  2  og 

ieA  ieA 


i£S\A 


1  —  exp  (—2  Ti 


a 


Ni 


1  1 
D  -  4 


ies 


1  —  exp(— 2rj 


a 


Ni 


(4.99) 

(4.100) 


The  region  specified  in  our  outer  bound  in  Theorem  13  is  identical  to  the  rate  region 
for  the  non- Byzantine  problem  given  in  [55,  56],  and  stated  in  (4.91)-(4.92),  except 
that  the  two  conditions  on  {1, . . . ,  L}  have  been  replaced  with  conditions  on  S  for 
all  sets  of  size  L  —  2s.  Together  our  inner  and  outer  bounds  match  at  s  —  0  and 
recover  the  non-adversary  result  of  [55,  56]. 

Proof:  Assume  (R\, . . . ,  Rl,  D)  is  achievable,  and  consider  a  code  that  achieves 
it  with  codewords  (Cx , . . .  ,Cl).  We  may  assume  without  loss  of  generality  that 
the  code  achieves  distortion  D  with  probability  at  least  1  —  e,  because  we  can 
always  repeat  the  code  multiple  times  and  apply  the  law  of  large  numbers.  Fix 
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S  C  {1,...,L}  with  \S\  =  L  —  2s.  We  may  write 

E  ^  E 


i&A 


i£A 


>  —H(Ca) 
n 

>  -H(Ca\CS\a) 

n 

>  -I(Y2-,Ca\Cs\a) 

n 

=  1-I{YX,X"-,CA\CS'SA) 

=  i/(A-";CU|CsvO  +  -I{Y’1,Ca\X'\Cs\a) 


n 

1 


n 


=  - I(Xn ;  Cs)  -  -I{Xn-  Cs\A )  +  V  -I{Y?-  Ci\Xr 
n  n  n 

ieA 


We  define  (for  this  section) 


(4.101) 


ri  =  -I(Y?-,Ci\Xn). 


n 


(4.102) 


Lemma  3.1  in  [56]  states  that  for  any  B  C  {1, . . . ,  L}, 


^exp  l-I(Xn-,CB)  <^  +  E 
ax  \n  J  ax  ieB 


1  —  exp  (—2  ri 


a 


Ni 


(4.103) 


which  allows  us  to  bound  the  second  term  in  (4.101).  Only  the  first  term  remains, 
which  we  may  rewrite  as 


- I(Xn-,Cs )  =  -h(Xn)  -  - h(Xn\Cs )  =  ilog27re4  ~  -h{Xn\Cs).  (4.104) 

n  n  n  2  n 

We  will  proceed  to  show  that 

—h(Xn\Cs)  <  -log2neD  (4.105) 

n  2 

which,  combined  with  (4.102),  (4.103),  and  (4.104),  allows  us  to  extend  (4.101)  to 
(4.99).  Taking  A  =  0  gives  (4.100). 


We  now  prove  (4.105).  Let  Hi,  H2  be  sets  of  size  L  —  s  such  that  S  —  Hi  D  H2. 
If  Hi  is  the  true  set  of  honest  nodes,  then  they  use  the  deterministic  encoding 
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functions  /*  to  get  Ch1  from  Y£i .  Meanwhile,  the  traitors,  H\c ,  choose  Ch p  The 
decoder’s  estimate  Xn  is  a  deterministic  function  of  CL,  but  when  Hi  are  the 
honest  nodes,  we  can  think  of  it  as  a  deterministic  function  of  Yfi  and  Ch^-  Thus 
we  can  define  the  set 

Sd(X,YHi)  =  {(*»,*,) :  <  £>|  . 

This  is  the  set  of  all  ( x11 ,  y^  )  pairs  for  which  Xn  achieves  the  distortion  constraint 
no  matter  what  the  traitors  do.  Because  we  assume  that  distortion  D  is  achieved 
with  probability  nearly  one,  the  probability  of  the  set  Sd(X,Yh)  is  also  nearly 
one.  We  define  the  set  Sd(X,YH2 )  in  an  analogous  fashion,  in  the  case  that  H2 
is  the  true  set  of  honest  nodes.  Since  a  code  achieving  distortion  D  must  perform 
no  matter  which  nodes  are  the  traitors,  Sd(X,YH2)  is  also  a  set  with  probability 
nearly  one. 

Now  define 

Qd(X,Ys)  =  {(xn,y^)  :  ^yrh1\H2->y1H2\H1  '■ 

(Y\  ynHl)  e  SD(X,  YHi),  (xn,  ynH2)  e  SD(X,  YH2)}.  (4.106) 

That  is,  Qd(X,  Y$)  is  the  set  of  pairs  ( xn ,  yg)  such  that  Xn  achieves  the  distortion 
constraint  for  some  y ‘h1\h2  when  H\  are  the  honest  nodes  and  some  Vh2\Hi  when 
H2  are  the  honest  nodes.  Because  the  Sp  sets  have  probability  nearly  one,  so  does 
Qd- 


For  a  fixed  y^n5,  define  the  conditional  version  of  Qd  as 


QD(X\yns )  =  {xn  :  (xn,ys)  G  Qd(X,Ys)}. 
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Note  that 

l-e<Pr(QD(X,YsO) 

=  /  dxndynsp{xn,yns) 

Jqd(x,Ys) 

=  [  dyMVs )  [  dxnp(xn\yns) 

J  d  Qo(X\yg) 

=  f  dyMVs)P*(QD(X\y%M). 

Since  this  is  a  convex  combination  nearly  equal  1,  each  individual  value  must  nearly 
equal  1,  so  in  particular  the  probability  of  Qjj(X\yg)  is  nearly  1  given  yg. 

Fix  a  codeword  eg.  Define 

Qd(A'|cs)  =  |J  Qd(X  |j/S). 

y&fs{y%)=cs 

From  the  high  probability  property  of  Qd(- X\yg),  d  follows  that  Qd(X\cs)  has 
high  probability  conditioned  on  eg  being  sent.  Hence 

-h(Xn\Cs)  <  —  max  log  Xo\(Qd (X \ chds))  ■  (4.107) 

n  n  cHns 

Consider  two  elements  xn:x'n  of  Qo(X\cg).  By  definition,  there  must  be  some 
sequences  yg,y'g  such  that  ( xn,yg ),  ( x,n,y’g )  G  QD(X,YHng).  From  the  definition 
of  Qd,  there  must  be  sequences  yrgl\H2  and  y'^2\Hl  extending  y g  and  y'g  respectively 
such  that  (xn,y1^Il)  G  Sd(X,YHi)  and  (xln,y'^2)  G  Sd(X,  YH2).  Consider  the  case 
that  cs,  cHl\H2  =  /ffi\tf2(z/tflVtf2),  and  cH2\Hl  =  /h2\Hi  (y'S2 Wl )  are  sent.  First 
observe  that  this  set  of  messages  could  have  been  produced  if  Xn  =  xn ,  Yfi  =  , 

and  Hi  were  the  set  of  honest  nodes.  Then  the  nodes  in  H2\  H i,  which  are 
all  traitors,  could  send  ch2\h1 ■  Since  (xn,yr}1^)  G  Sd ( X,  Yhx ) ,  by  definition  the 
estimate  xn  produced  at  the  decoder  must  satisfy  dd(xn,xn)  <  D.  However,  the 
same  set  of  messages  could  have  been  produced  if  Xn  =  x'n,  yh2  =  Vh2 ,  and 
H-2  were  the  set  of  honest  nodes,  where  Hi  \  H2  decide  to  send  ch \\h2-  Since  the 
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decoder  produces  just  one  estimate  for  any  input  messages,  the  very  same  estimate 

xn,  by  the  same  reasoning,  must  satisfy  ^d(x'n,xn)  <  D.  Hence,  we  have 

1  n 

-^Wt)  - H*))2  <  D , 

1 1= i 

i  n 

-^W)2  <  D. 

1  t= 1 

We  may  rewrite  this  as 

||x  —  x||2  <  VnD, 

||a/  —  x\\2  <  VnD. 

Therefore  by  the  triangle  inequality,  for  any  xn,x,n  G  Q,d(W|cs), 

||x  —  x'\\2  <  2 VnD. 

That  is,  Q£>(vY|cs)  has  diameter  at  most  2 \JnD.  The  following  lemma  bounds  the 
volume  of  subsets  of  Mn  as  a  function  of  their  diameter.  It  is  proved  is  Sec.  4.7.1. 

Lemma  10  The  volume  of  any  subset  of  Mn  is  no  more  than  that  of  the  n-ball 
with  the  same  diameter. 

Using  Lemma  10,  we  have  that  the  volume  of  Qd(X\cs)  is  no  more  than  the  volume 
of  an  n-ball  with  radius  VnD.  It  can  be  easily  shown  that  such  a  ball  has  volume 
no  more  than  (2neD)nl2 .  Applying  this  to  (4.107)  gives  (4.105),  completing  the 
proof.  □ 


4.7.1  Proof  of  Lemma  10 

Fix  a  set  A  C  Rn  with  diameter  2 r.  That  is,  for  any  x,y  G  A,  ||a:  —  7/ 1| 2  <  2 r. 
Consider  the  set  sum 

A  —  A  =  {x  —  y  :  x,  y  £  A}. 
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Certainly  for  any  point  z  E  A  — A,  ||z||2  <  2 r.  Therefore,  A  —  A  is  contained  in  the 
n-ball  of  radius  2 r.  Let  Cn  be  the  volume  of  a  unit  n-ball,  so  an  n-ball  of  radius  r 
has  volume  Cnrn.  ffence 

Vol(A  -  A)  <  Cn(2r)n  =  2 nCnrn.  (4.108) 

The  Brunn-Minkowski  inequality  [92]  states  that  for  any  A,  B  C.  Mn, 

Vol(A  +  B)l/n  >  Vol(A)1/n  +  Vol(B)1/n. 

Therefore 

Vol(A  -  A)  >  [Vol(kl)1/n  +  Vol(— A)1/n]n  =  2nVol(kl).  (4.109) 

Combining  (4.108)  with  (4.109)  gives 

Vol(kL)  <  Cnrn. 

That  is,  the  volume  of  A  is  no  more  than  that  of  an  n-ball  with  the  same  diameter. 


4.8  Asymptotic  Results  for  the  Quadratic  Gaussian  Prob¬ 
lem 


The  following  theorem  bounds  the  asymptotic  proportionality  constant  K(/3). 


Theorem  14  For  a  fraction  f3  of  traitors 


°n  1 
2cr %  1  -  2/3 


<  K(P)  < 


&N _ \/l  ~  /3  + 

2 4  (l-/3)(%/W-^)‘ 


(4.110) 


At  fj  —  0,  the  two  bounds  meet  at  a2N/{2a\ ),  matching  the  result  of  [54],  They 
also  both  diverge  at  j3  =  1/2.  The  ratio  between  them  is  monotonically  increasing 
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in  (3  and  is  never  more  than  4.  The  proof  is  stated  in  the  next  two  subsections, 
and  both  sides  make  use  of  the  bounds  already  found  on  the  rate-distortion  region 
in  Sec.  4.6  and  Sec.  4.7. 


4.8.1  Proof  of  the  Upper  Bound  on  the  Asymptotic  Pro¬ 
portionality  Constant 


We  apply  Theorem  12.  For  a  given  sum-rate  R ,  let  Ri  =  R/L  for  all  i.  Let  r  be 
the  largest  possible  value  satisfying  (4.93)  where  rt  =  r/L.  It  is  not  hard  to  show 
that  for  large  L  and  R,  r  is  nearly  equal  to  R. 


We  need  to  specify  the  function  q(E).  First  define  for  all  A  C  {1, . . . ,  L} 


XA  =  E(X\Ua)  =  — 


Ei 


ga 


Ui 


+  \A\ 


1— exp(— 2  r/L) 


When  X  and  UA  are  related  according  to  the  nominal  distribution, 

1 


E(X  -  xAy  =  — 
75 

If  we  fix  |A|/L,  for  large  L  and  R, 

E(X  -  XA) 


|A| 


1— exp(— 2r/L) 


2  r 

2  _  aN  L 


2 R  \A\ ' 


Also  observe  that  if  B  C  A, 


E(Aa  -  X Bf  =  E(X  -  X B)2  -  E(X  -  XAf. 

We  choose  the  c%  in  the  following  way.  Given  E,  we  look  for  a  set  H  C  {1, . . . ,  L} 
of  size  (1  —  /3)L  that  has  the  expected  distribution  if  H  were  the  set  of  honest 
agents.  That  is,  for  all  i,  j  G  H, 

2 

^  ,  aN  X 

iJ~  x  +  !  -  exp(-2r/L)  iJ' 
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If  there  is  more  than  one  such  i/,  choose  between  them  arbitrarily.  Then  define  ct 
such  that 

L 


^  '  Q  Ui  —  X  . 

i=  1 

Now  we  show  that  this  choice  achieves  the  upper  bound  given  in  Theorem  14.  In 
the  worst  case,  the  true  set  of  honest  agents  H  shares  just  (1  —  2 (3)L  agents  with 
H.  Because  is  distributed  according  to  the  nominal  distribution, 


E{Xfi  -  X 


HnH ) 


=  E(X  -  X 


HCH) 


-  E(X  -  X 


H  > 


a 


N 


2R  \\Hn  H\  \H\ 


< 


a 


N 


2R\l-2[3  1-/3 

Furthermore,  since  HnH  contains  only  honest  agents, 


E(X 


HnH 


xy 


o 


N 


< 


O 


N 


2 R  \HHH\  ~  2R  1-2/3 
Therefore  by  the  Cauchy-Shwartz  inequality 


E(X^  -  xy  <  (  yE(x^  -  xAnHy  +  yE(x6nH  -  xy 

2 


< 


a 


N 


2  R 


a 


N 


1-2/3  1-/3 

\/i —  P  +  VP 


+ 


1-2/3 


2 R  (1  -  (3)(VT=]3  ~  VPY 

Therefore  in  the  for  large  L  and  R, 

re (Xa-xy  <  ol  yf(\  -P)  +  VP 


(7 


X 


-  2  a\  (1  -  -  VP)' 
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4.8.2  Proof  of  the  Lower  Bound  on  the  Asymptotic  Pro¬ 
portionality  Constant 


We  apply  Theorem  13.  Let  r  =  Yli=i  vy.  Certainly 
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Observe  that 
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CHAPTER  5 

MALICIOUS  DATA  ATTACKS  ON  POWER  SYSTEM  STATE 

ESTIMATION 


5.1  Introduction 

Since  the  beginning  of  the  development  of  power  system  state  estimation  [69],  it 
has  been  necessary  to  deal  with  bad  data.  Traditionally,  bad  data  were  assumed 
to  be  caused  by  random  errors  resulting  from  a  fault  in  a  power  meter  and/or  its 
attendant  communication  system.  These  errors  are  modeled  by  a  change  of  vari¬ 
ance  in  Gaussian  noise,  which  leads  to  an  energy  (L2)  detector.  In  this  chapter,  we 
study  the  problem  that  several  meters  are  seized  by  an  adversary  that  is  able  to 
corrupt  the  measurements  from  those  meters  that  are  received  by  the  control  cen¬ 
ter.  This  differs  from  previous  investigations  of  the  problem  in  that  the  malicious 
data  at  various  meters  can  be  simultaneously  crafted  by  the  adversary  to  defeat 
the  state  estimator,  as  opposed  to  independent  errors  caused  by  random  faults. 

This  problem  was  first  studied  in  [78],  in  which  it  was  observed  that  there  exist 
cooperative  and  malicious  attacks  on  meters  that  all  known  bad  data  techniques 
will  fail  to  detect.  The  authors  of  [78]  gave  a  method  to  adjust  measurements 
at  just  a  few  meters  in  the  grid  in  such  a  way  that  bad  data  detector  will  fail 
to  perceive  the  corruption  of  the  data.  In  the  sequel,  we  describe  the  attacks  an 
unobservable  attacks,  as  they  are  closely  related  to  the  classical  notion  of  unob¬ 
servability  of  an  estimation  problem.  We  regard  the  existence  of  unobservable 
attacks  as  a  fundamental  limit  on  the  ability  to  detect  malicious  data  attacks.  We 
therefore  study  the  problem  in  two  regimes:  when  the  adversary  can  executed  an 
unobservable  attack,  and  when  it  cannot  or  does  not.  In  Sec.  5.3,  we  study  the 
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former  case,  by  characterizing  the  conditions  under  which  an  unobservable  attack 
exists,  and  giving  an  efficient  algorithm  for  finding  small  unobservable  attacks. 
This  can  provide  some  insight  into  how  vulnerable  a  given  power  network  is  to 
such  an  attack. 

In  the  regime  that  an  unobservable  attack  cannot  be  performed,  it  is  possible  for 
the  control  center  to  detect  malicious  data  attacks.  Moreover,  it  is  less  clear  what 
the  worst  attacks  are  for  the  adversary.  Therefore  we  study  we  study  two  aspects  of 
the  problem:  (i)  attack  detection  and  localization  strategies  at  the  control  center; 
(ii)  attack  strategies  by  the  adversary. 

We  present  in  Sec.  5.4  a  decision  theoretic  formulation  of  detecting  malicious 
data  injection  by  an  adversary.  Because  the  adversary  can  choose  where  to  attack 
the  network  and  design  the  injected  data,  the  problem  of  detecting  malicious  data 
cannot  be  formulated  as  a  simple  hypothesis  test,  and  the  uniformly  most  power 
test  does  not  exist  in  general.  We  propose  a  detector  based  on  the  generalized 
likelihood  ratio  test  (GLRT).  GLRT  is  not  optimal  in  general,  but  it  is  known 
to  perform  well  in  practice  and  it  has  well  established  asymptotic  optimality  [87, 
88,  89].  In  other  words,  if  the  detector  has  many  data  samples,  the  detection 
performance  of  GLRT  is  close  to  optimal. 

We  note  that  the  proposed  detector  has  a  different  structure  from  those  used  in 
conventional  bad  data  detectors  which  usually  employ  a  test  on  the  state  estimator 
residues  errors  [69,  70,  94],  The  proposed  the  GLRT  detector  does  not  compute 
explicitly  the  residue  error.  We  show,  however,  that  when  there  is  at  most  one 
attacked  meter  (a  single  attacked  data),  the  GLRT  is  identical  to  the  classical 
largest  normalized  residue  (LNR)  test  using  the  residue  error  from  the  minimum 
mean  square  error  (MMSE)  state  estimator.  The  asymptotic  optimality  of  GLRT 
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lends  a  stronger  theoretic  basis  for  the  LNR  test  for  the  single  bad  data  test. 

Next  we  investigate  malicious  data  attack  from  the  perspective  of  an  adversary 
who  must  make  a  tradeoff  between  inflicting  the  maximum  damage  on  state  estima¬ 
tion  and  being  detected  by  the  EMS  at  the  control  center.  We  define  in  Sec.  5.5  the 
notion  of  Attacker  Operating  Characteristic  (AOC)  that  characterizes  the  tradeoff 
between  the  probability  of  being  detected  vs.  resulting  (extra)  mean-square  error 
at  the  state  estimator.  We  therefore  formulate  the  problem  of  optimal  attack  as 
minimizing  the  probability  of  being  detected  subject  to  causing  the  mean  square 
error  (MSE)  to  increase  beyond  a  predetermined  level.  Finding  the  attack  with  the 
optimal  AOC  is  intractable,  unfortunately.  We  present  a  heuristic  that  allows  us 
to  obtain  attacks  that  with  minimum  attack  power  leakage  to  the  detector  while 
increasing  the  mean  square  error  at  the  state  estimator  beyond  a  predetermined 
objective.  This  heuristic  reduces  to  an  eigenvalue  problem  that  can  be  solved  off 
line. 

Finally,  in  Sec.  5.6  we  conduct  numerical  simulations  on  a  small  scale  example 
using  the  IEEE  14  bus  network.  For  the  control  center,  we  present  simulation 
results  that  compare  different  detection  schemes  based  on  the  Receiver  operating 
Characteristics  (ROC)  that  characterize  the  tradeoff  between  the  probability  of 
attack  detection  vs.  the  probability  of  false  alarm.  We  show  that  there  is  a 
substantial  difference  between  the  problem  of  detecting  randomly  appearing  bad 
data  from  detecting  malicious  data  injected  by  an  adversary.  Next  we  compare 
the  GLRT  detector  with  two  classical  detection  schemes:  the  J(x)  detector  and 
the  (Bayesian)  largest  normalized  residue  (LNR)  detector  [69,  70].  Our  test  shows 
improvement  over  the  two  well  established  detection  schemes.  From  the  adversary 
perspective,  we  compare  the  Attacker  Operating  Characteristics  (AOC).  Our  result 
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shows  again  that  the  GLRT  detector  gives  higher  probability  of  detection  than 
that  those  of  conventional  detectors  for  the  same  amount  MSE  increase  at  the 
state  estimator. 


5.2  Problem  Formulation 

A  power  system  is  composed  of  a  collection  of  busses,  transmission  lines,  and  power 
flow  meters.  We  adopt  a  graph-theoretic  model  for  such  a  system.  Therefore  the 
power  system  is  modeled  as  an  undirected  graph  (V,E),  where  V  represents  the 
set  of  busses,  and  E  is  the  set  of  transmission  lines.  Each  line  connects  two  meters, 
so  each  element  e  G  E  is  an  unordered  pair  of  busses  in  V.  Fig  5.1  shows  the  graph 
structure  of  the  IEEE  14-bus  test  system,  which  we  use  in  our  simulations.  The 
control  center  receives  measurements  from  various  meters  deployed  throughout 
the  system,  from  which  it  performs  state  estimation.  Meters  come  in  two  varieties: 
transmission  line  flow  meters,  which  measure  the  power  flow  through  a  single 
transmission  line,  and  bus  injection  meters,  which  measure  the  total  outgoing 
flow  on  all  transmission  lines  connected  to  a  single  bus.  Therefore  each  meter  is 
associated  with  either  a  bus  in  V  or  a  line  in  E.  We  allow  for  the  possibility  of 
multiple  meters  on  the  same  bus  or  line.  Indeed,  in  our  simulations,  we  assume 
that  a  meter  is  placed  in  every  bus,  and  two  meters  on  every  line,  one  in  each 
direction. 

The  graph-theoretic  model  for  the  power  system  yields  the  following  DC  power 
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Figure  5.1:  IEEE  14  bus  test  system. 


flow  model,  a  linearized  version  of  the  AC  power  flow  model  [95]: 

z  =  Hx  +  a  +  e  (5.1) 

e  ~  A/"(0,  £e), 

a  G  As  =  {a  G  W71  :  ||a||0  <  s} 

where  z  e  Mm  is  the  vector  of  power  flow  measurements,  x  e  Mn  is  the  system 
state,  e  is  the  Gaussian  measurement  noise  with  zero  mean  and  covariance  matrix 
Ee,  and  vector  a  is  malicious  data  injected  by  an  adversary.  Here  we  assume  that 
the  adversary  can  at  most  control  s  meters.  That  is,  a  is  a  vector  with  at  most 
s  non-zero  entries  (||a||0  <  s).  A  vector  a  is  said  to  have  sparsity  s  if  ||a||0  =  s. 
The  H  matrix  in  (5.1)  arises  from  the  graph  theoretic  model  as  follows.  For  each 
transmission  line  (£>i ,  62)  G  E,  the  DC  power  flow  through  this  line  from  bus  b\  to 


203 


220 


bus  62  is  given  by 


0  •  •  •  0  Y '( 


(61,62) 


0  •  •  •  0  -Y( 


(61,62) 


0  •••  0 


61th  element 


62  th  element 


(5.2) 


where  A^ltb2)  is  the  susceptance  of  the  transmission  line  (61,62)-  Let  h(bltb2)  be  the 
row  vector  in  (5.2).  If  a  meter  measures  the  flow  through  the  transmission  line 
connecting  busses  61  and  62,  then  the  associated  row  of  H  is  given  by  h^i,b2)-  If 
a  meter  measures  the  power  injection  for  bus  61,  then  the  associated  row  of  H  is 
given  by 


'y  i  ^(61 ,62)- 


(5.3) 


5.2.1  A  Bayesian  Framework  and  MMSE  Estimation 

We  consider  in  this  paper  a  Bayesian  framework  where  the  state  variables  are  ran¬ 
dom  vectors  with  Gaussian  distribution  Ex).  We  assume  that,  in  practice, 

the  mean  fj,x  and  covariance  Ex  can  be  estimated  from  historical  data.  By  sub¬ 
tracting  the  mean  from  the  data,  we  can  assume  without  loss  of  generality  that 
AG  =  0. 

In  the  absence  of  an  attack,  i.e.  a  =  0  in  (5.1),  (z,  x)  are  jointly  Gaussian.  The 
minimum  mean  square  error  (MMSE)  estimator  of  the  state  vector  x  is  a  linear 
estimator  given  by 

x(z)  =  argminE(||x  —  x(z)||2)  =  Kz  (5.4) 

X 

where 

K  =  X  II;  HX  Ik  +  Ee)-1.  (5.5) 

The  minimum  mean  square  error,  in  the  absence  of  attack,  is  given  by 

do  =  minE(||x  -  x(z)||2)  =  Tr  (Ex  -  KXHEX) . 
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If  an  adversary  injects  malicious  data  a  £  As  but  the  control  center  is  unaware 
of  it,  then  the  state  estimator  defined  in  (5.4)  is  no  longer  the  true  MMSE  estimator 
(in  the  presence  of  attack);  the  estimator  x  =  Kz  is  a  “naive”  MMSE  estimator 
that  ignores  the  possibility  of  attack,  and  it  will  incur  a  higher  mean  square  error 
(MSE).  In  particular,  it  is  not  hard  to  see  that  the  MSE  in  the  presence  of  a  is 
given  by 

£0  +  ||  Ka|||.  (5.6) 

The  impact  on  the  estimator  from  a  particular  attack  a  is  given  by  the  second  term 
in  (5.6).  To  increase  the  MSE  at  the  state  estimator,  the  adversary  necessarily  has 
to  increase  the  “energy”  of  attack,  which  increases  the  probability  of  being  detected 
at  the  control  center. 


5.3  Unobservable  Attacks 

Liu,  Ning  and  Reiter  observe  in  [78]  that  if  there  exists  a  nonzero  s-sparse  a  for 
which  a  =  He  for  some  c,  then 

z  =  Hx  +  a  +  e  =  H(x  +  c)  +  e. 

Therefore  x  cannot  be  distinguished  from  x  +  c  at  the  control  center.  If  both  x 
and  x  +  c  are  valid  network  states,  the  adversary’s  injection  of  data  a  when  the 
true  state  is  x  will  lead  the  control  center  to  believe  that  the  true  network  state 
is  x  +  c,  and  vector  c  can  be  scaled  arbitrarily.  Since  no  detector  can  distinguish 
x  from  x  +  c,  we  call  hereafter  an  attack  vector  a  unobservable  if  it  has  the  form 
a  =  He. 

Note  that  it  is  unlikely  that  random  bad  data  a  will  satisfy  a  =  He.  But  an 
adversary  can  synthesize  its  attack  vector  to  satisfy  the  unobservable  condition. 
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5.3.1  Characterization  of  Unobservable  Attacks 

The  following  theorem  demonstrates  that  this  type  of  attack  is  closely  related  to 
the  classical  notion  of  network  observability  [75]. 

Theorem  15  An  s-sparse  attack  vector  a  comprises  an  unobservable  attack  if 
and  only  if  the  network  becomes  unobservable  when  the  s  meters  associated  with 
the  nonzero  entries  of  a  are  removed  from  the  network;  that  is,  the  {m  —  s)  x  n 
submatrix  of  H  taken  from  the  rows  of  H  corresponding  to  the  zero  entries  of  a 
does  not  have  full  column  rank. 

Proof:  Without  loss  of  generality,  let  H  be  partitioned  into  H7  =  [H'(  |  Hj], 
and  submatrix  Hi  does  not  have  full  column  rank,  i.e.  there  exists  a  vector  c  /  0 
such  that  HlC=0.  We  now  have  a  =  He  e  As ,  which  is  unobservable  by  definition. 
Conversely,  consider  an  unobservable  a  =  He  e  As.  Without  loss  of  generality,  we 
can  assume  that  the  first  m  —  s  entries  of  a  are  zero.  We  therefore  have  Hie  =  0 
where  Hx  is  the  submatrix  made  of  the  first  m  —  s  rows  of  H.  □ 

The  implication  from  the  above  theorem  is  that  the  attack  discovered  in  [78] 
is  equivalent  to  removing  s  meters  from  the  network  thus  making  the  network  not 
observable. 

Note  that  even  though  an  unobservable  attack  is  equivalent  to  the  network 
being  made  unobservable,  the  adversarial  attack  is  still  much  more  destructive. 
When  the  network  is  unobservable  because  there  are  insufficient  meters,  the  control 
center  can  easily  determine  this;  it  knows  exactly  what  aspects  about  the  system 
state  it  can  gather  information  about,  and  which  it  cannot.  However,  in  the  case 
of  an  unobservable  adversarial  attack,  the  control  center  does  not  know  it  is  under 
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attack,  nor  which  of  several  possible  attacks  is  being  executed.  Therefore  the 
situation  is  much  more  precarious,  because  the  control  center  does  not  even  know 
what  it  does  not  know. 

5.3.2  Graph-Theoretic  Approach  to  Minimum  Size  Unob¬ 
servable  Attacks 

To  know  how  susceptible  a  power  system  is  to  this  highly  damaging  unobservable 
attack,  it  is  important  to  know  how  few  meters  must  be  controlled  by  the  adversary 
before  the  attack  can  be  performed.  From  Theorem  15,  we  know  that  there  is  an 
unobservable  s-sparse  attack  vector  a  if  and  only  if  it  is  possible  to  remove  s  rows 
from  H  and  cause  H  not  to  have  full  column  rank.  Finding  the  minimum  such  s 
for  an  arbitrary  H  is  a  hard  problem.  However,  it  becomes  easier  given  the  extra 
structure  on  H  imposed  by  the  network  topology. 

We  now  give  a  simple  method  to  find  sets  of  meters  whose  removal  make  the 
system  unobservable.  Moreover,  we  show  that  it  is  possible  to  efficiently  minimize 
the  size  of  the  set  of  meters  produced  by  this  method;  thereby  one  may  efficiently 
compute  small  sets  of  meters  from  which  an  adversary  may  execute  an  unobservable 
attack. 

For  a  set  of  lines  A  C  E,  let  g{A)  be  the  set  of  meters  either  on  lines  in  A  or  on 
busses  adjacent  to  lines  in  A.  Let  h(A)  be  the  number  of  connected  components  in 
the  graph  (V,  E\A)]  i.e.  the  original  graph  after  all  lines  in  A  have  been  removed. 
The  following  theorem  gives  a  simple  method  for  determining  a  number  of  meters 
in  g(A)  to  remove  from  the  network  to  make  it  unobservable.  The  proof  relies  on 
[77],  which  gave  an  efficient  method  to  determine  the  observability  of  a  network 
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based  only  on  its  topology. 

Theorem  16  (Sufficient  condition  for  unobservable  attacks)  For  all  A  C 

E,  removing  an  arbitrary  subset  of  g(A)  of  size  \g(A)\  —  h(A)  +  2  makes  the  system 
unobservable. 

Proof:  Let  V  and  E  be  the  sets  of  busses  and  lines  respectively  with  a  me¬ 
ter  placed  on  them.  Theorem  5  in  [77]  states  that  the  power  system  given  by 
(V,  E,  V,  E)  is  observable  if  and  only  if  there  exists  aJC  E  comprising  a  span¬ 
ning  tree  of  V  and  an  assignment  function 

E  (5.7) 


satisfying: 

1.  If  /  G  E,  then  (f)(1)  =  /. 

2.  If  (f)(1)  G  V,  then  line  l  is  incident  to  the  bus  (f)(1). 

3.  If  li,l2  G  3  are  distinct,  then  (f>(l\)  ^  <j>(h)- 

The  principle  behind  this  theorem  is  that  a  bus  injection  meter  may  “impersonate” 
a  single  line  meter  on  a  line  incident  to  the  bus.  If  a  bus  b  =  (f)(1)  for  some  line 
/,  this  represents  the  meter  at  b  impersonating  a  meter  on  line  l.  The  system  is 
observable  if  and  only  if  a  spanning  tree  T  exists  made  up  of  transmission  lines 
with  either  real  meters  or  impersonated  meters  by  bus  meters. 

Not  including  the  lines  in  A ,  the  network  splits  into  h(A)  separate  pieces. 
Therefore,  any  spanning  tree  T  must  include  at  least  h(A)  —  1  lines  in  A.  Any 
assignment  0  satisfying  the  conditions  above  must  therefore  employ  at  least  h(A)  — 
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1  meters  in  g(A).  Hence,  if  any  |g(M)|  —  h(A)  +2  of  these  meters  are  removed  from 
the  network,  only  h(A)  —  2  remain,  which  is  not  enough  to  create  a  full  spanning 
tree,  so  the  network  becomes  unobservable.  □ 

Example:  Consider  the  IEEE  14-bus  test  system,  shown  in  Fig.  5.1.  Take 
A  =  {(7,  8)}.  Since  bus  8  is  only  connected  to  the  system  through  bus  7,  removing 
this  line  from  the  network  cuts  it  into  two  pieces.  Therefore  h(A)  =  2.  The 
set  of  meters  g(A)  consists  of  meters  on  the  line  (7,8),  and  bus  injection  meters 
at  bus  7  and  8.  Theorem  16  states  that  if  we  remove  |g(M)|  meters  from  this 
set — that  is,  all  the  meters  in  g{A) — the  system  becomes  unobservable.  In  our 
simulation  examples,  we  assume  there  are  two  meters  on  each  line,  therefore  it 
takes  4  meters  to  execute  an  unobservable  attack.  Furthermore,  it  is  not  hard  to 
employ  Theorem  16  to  find  similar  4-sparse  unobservable  attacks  on  the  30-bus, 
118-bus,  and  300-bus  test  systems. 

Theorem  16  provides  a  method  to  find  unobservable  attacks,  but  we  would  like 
to  find  attacks  using  as  few  meters  as  possible.  We  use  the  theory  of  submodular 
functions  to  show  that  the  quantity  |g(A)|  —  h(A)  +  2  can  be  efficiently  minimized 
over  all  sets  of  edges  A.  This  significantly  increases  the  usefulness  of  Theorem  16, 
because  it  means  we  can  efficiently  End  small  unobservable  attacks  for  arbitrary 
power  systems. 

A  submodular  function  is  a  real-valued  function  /  defined  on  the  collection  of 
subsets  of  a  set  W  such  that  for  any  4,BC  W, 

f(A  u  B)  +  f(A  n  B)<  f(A)  +  f(B).  (5.8) 

Moreover,  a  function  /  is  supermodular  if  — /  is  submodular.  There  are  several 
known  techniques  to  find  the  set  A  C  W  minimizing  f(A)  in  time  polynomial  in 
the  size  of  W  [84,  85,  86].  It  is  not  hard  to  see  that  |g(M)|  is  submodular  in  A, 
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and  h(A)  is  supermodular.  Therefore,  their  difference  is  submodular,  so  it  can  be 
efficiently  minimized. 


5.4  Detection  of  Malicious  Data  Attack 

In  this  section,  we  study  the  problem  in  the  regime  that  the  adversary  cannot  or 
does  not  perform  an  unobservable  attack  as  described  in  Sec.  5.3.  In  this  regime, 
it  is  possible  to  detect  the  adversary’s  presence.  We  first  formulate  the  detection 
problem,  then  introduce  the  generalized  likelihood  ratio  test  (GLRT),  as  well  as 
some  classical  detectors. 

5.4.1  Statistical  Model  and  Attack  Hypotheses 

We  now  present  a  formulation  of  the  detection  problem  at  the  control  center.  We 
assume  a  Bayesian  model  where  the  state  variables  are  random  with  a  multivariate 
Gaussian  distribution  x  ~  N(0,  XU).  Our  detection  model,  on  the  other  hand,  is 
not  Bayesian  in  the  sense  that  we  do  not  assume  any  prior  probability  of  the  attack 
nor  do  we  assume  any  statistical  model  for  the  attack  vector  a. 

Under  the  observation  model  (5.1),  we  consider  the  following  composite  binary 
hypothesis: 

IHo  :  a  =  0  versus  “Ki  :  a  e  As  \  {0}.  (5.9) 

Given  observation  z  e  Mm,  we  wish  to  design  a  detector  5  :  Rm  — >  {0, 1}  with 
<5(z)  =  1  indicating  a  detection  of  attack  (TCi)  and  <5( z)  =  0  the  null  hypothesis. 

An  alternative  formulation,  one  we  will  not  pursue  here,  is  based  on  the  extra 
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MSE  1 1 Ka|| |  at  the  state  estimator.  See  (5.6).  In  particular,  we  may  want  to 
distinguish,  for  ||a||0  <  s, 

"K'q  :  || Ka|| |  <  C,  versus  ! Ji \  :  || Ka|| |  >  C.  (5.10) 

Here  both  null  and  alternative  hypotheses  are  composite  and  the  problem  is  more 
complicated.  The  operational  interpretation,  however,  is  significant  because  one 
may  not  care  in  practice  about  small  attacks  that  only  marginally  increase  the 
MSE  of  the  state  estimator. 

5.4.2  Generalized  Likelihood  Ratio  Detector  with  L\  Norm 
Regularization 

For  the  hypotheses  test  given  in  (5.9),  the  uniformly  most  powerful  test  does  not 
exist.  We  propose  a  detector  based  on  the  generalized  likelihood  ratio  test  (GLRT). 
We  note  in  particular  that,  if  we  have  multiple  measurements  under  the  same  a, 
the  GLRT  proposed  here  is  asymptotically  optimal  in  the  sense  that  it  offers  the 
fastest  decay  rate  of  miss  detection  probability  [96]. 

The  distribution  of  the  measurement  z  under  the  two  hypotheses  differ  only  in 
their  means 


Ji0  '■  z  ~  3\T(0,  S2) 

CKi  :  z  ~  3\f(a,  Sz),  a  G  As  \  {0} 


where  S2  =  H£.CHT  +  £e. 


The  GLRT  is  given  by 
max /(z  |  a)  Ml 
/( z|a  =  0) 


r  /  \  A  aC./l 

L{  z)  = 


^  T, 


(5.11) 
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where  /( z|a)  be  the  Gaussian  density  function  with  mean  a  and  covariance  E,, 
and  the  threshold  r  is  chosen  from  under  null  hypothesis  for  a  certain  false  alarm 
rate.  This  is  equivalent  to 

M0 

min  aTS“1a  —  2zTE“1a  r.  (5.12) 

q/z  (\  *  ^  v  ' 

Thus  the  GLRT  reduces  to  solving 


minimize  a7  Ez  xa  —  2z7  Ez  *a 
subject  to  ||a||0  <  s. 


(5.13) 


For  a  hxed  sparsity  pattern,  i.e.  if  we  know  the  support  but  not  necessarily 
the  actual  values  of  a,  the  above  optimization  is  easy  to  solve.  In  other  words, 
if  we  know  a  small  set  of  suspect  meters  from  which  malicious  may  be  injected, 
the  above  test  is  easily  computable.  The  sparsity  condition  on  a  makes  the  above 
optimization  problem  non-convex,  but  for  small  s  it  can  be  solved  exactly  simply 
by  exhaustively  searching  through  all  sparsity  patterns.  For  larger  s,  this  is  not 
feasible.  It  is  a  well  known  technique  that  (5.13)  can  be  approximated  by  a  convex 
optimization: 


minimize  a7  Ez  xa  —  2z7  Ez  *a 


(5.14) 


subject  to  || a||!  <  v 

where  the  L\  norm  constraint  is  a  heuristic  for  the  sparsity  of  a.  The  constant  v 
needs  to  be  adjusted  until  the  solution  involves  an  a  with  sparsity  s.  This  requires 
solving  (5.14)  several  times.  A  similar  approach  was  taken  in  [79]. 


5.4.3  Classical  Detectors  with  MMSE  State  Estimation 

We  will  compare  the  performance  of  the  GLRT  detector  with  two  classical  bad 
data  detectors  [69,  70],  both  based  on  the  residual  error  r  =  z  —  Hx  resulted  from 
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the  MMSE  state  estimator. 

The  first  is  the  J(x)  detector,  given  by 

Mi 

rTS“1r  r.  (5.15) 

M0 

The  second  is  the  largest  normalized  residue  (LNR)  test  given  by 

max  ^  ^  r,  (5.16) 

1  an  M0 

where  ar%  is  the  standard  deviation  of  the  ith  residual  error  rt.  We  may  regard 
this  is  a  test  on  the  /oo-norm  of  the  measurement  residual,  normalized  so  that  each 
element  has  unit  variance. 

The  asymptotic  optimality  of  the  GLRT  detector  implies  a  better  performance 
of  GLRT  over  the  above  two  detectors  when  the  sample  size  is  large.  For  the 
finite  sample  case,  numerical  simulations  shown  in  Sec  5.6  confirm  that  the  GLRT 
detector  improves  the  performance  of  the  J(x)  and  LNR  detectors.  The  interesting 
exception  is  the  case  when  only  one  meter  is  under  attack,  i.e.  ||a||o  =  1  and 

£e  =  <TgI.  In  this  case,  the  GLRT  turns  out  to  be  identical  to  the  LNR  detector. 

Therefore,  the  GLRT  can  be  viewed  as  a  generalization  of  the  LNR  detector,  in 
that  it  can  be  tuned  to  any  sparsity  level.  Moreover,  this  provides  some  theoretical 
justification  for  the  LNR  detector.  The  equivalence  of  the  two  detectors  is  stated 
and  proved  in  the  following  Proposition. 

Proposition  1  When  s  =  1,  the  GLRT  detector  given  in  (5.12)  is  equivalent  to 
the  LNR  detector  given  in  (5.16). 

Proof:  If  s  —  1,  the  left  hand  side  of  (5.12)  becomes 

minmin(Ej%af  -  2zT(E71)iai  (5.17) 

i  di 
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where  (E 


J1)** 


is  the  ith  diagonal  element  of  E,  ,  and  (E2  ^  is  the  ith  row  of  E 


1-1 


The  second  minimization  can  be  solved  in  closed  form,  so  (5.17)  becomes 


T/v  —  1V12 


max  ■ 

i 


z7  (E 


z 


(E-i 


(5.18) 


Z  )ll 


We  may  therefore  write  the  GLRT  as 


|zT(E 


Mi 


max 


z  )i\  >  ^ 

-  <  T. 


(5.19) 


i  vTWh  Ko 

The  vector  of  numerators  in  (5.19)  is  given  by  r'  =  Ej1z.  Note  that  the  covariance 
matrix  of  r'  is  simply  E”1.  Therefore  we  may  regard  (5.19)  as  a  test  on  the 
maximum  element  of  the  r'  after  each  element  is  normalized  to  unit  variance. 


We  now  show  that  r'  is  just  a  constant  multiple  of  r,  meaning  that  (5.19)  is 
identical  to  (5.16),  saving  a  constant  factor.  Recall  that  r  =  (I  —  HK)z,  where 

I  -  HK  =  I  -  HE,  II  I  IE,  II7  +  Ee)”1 

=  (HSjH1  +  Ee  -  I  IS,  1 1 7 )  I  IS  1 1 7  +  Ee)"1 

=  EeEr1  =  (TgEj1. 

Thus  r  =  crlr'-,  the  two  detectors  are  identical.  □ 


5.5  Attack  Operating  Characteristics  and  Optimal  Attacks 

We  now  study  the  impact  of  malicious  data  attack  from  the  perspective  of  an 
attacker.  We  assume  that  the  attacker  knows  the  (MMSE)  state  estimator  and 
the  (GLRT)  detector  used  by  the  control  center.  We  also  assume  that  the  attacker 
can  choose  s  meters  arbitrarily  in  which  to  inject  malicious  data.  In  practice, 
however,  the  attacker  may  be  much  more  limitted.  Thus  our  results  here  are 
perhaps  more  pessimistic  than  in  reality. 
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5.5.1  AOC  and  Optimal  Attack  Formulations 


The  attacker  faces  two  conflicting  objectives:  maximizing  the  MSE  by  choosing 
the  best  data  injection  a  vs.  avoiding  being  detected  by  the  control  center.  The 
tradeoff  between  increasing  MSE  of  the  state  estimator  and  lower  the  probability  of 
detection  is  characterized  by  attacker  operating  characteristics  (AOC),  analogous 
to  the  receiver  operating  characteristics  (ROC)  at  the  control  center.  Specifically, 
AOC  is  the  probability  of  detection  of  the  detector  Pr(5(z)  =  1  |  a)  as  a  function 
of  the  extra  MSE  £(a)  =  £o  +  ||Ka|||  (5.6)  at  the  state  estimator,  where  £o  is  the 
MMSE  in  the  absence  of  attack. 


The  optimal  attack  in  the  sense  of  maximizing  the  MSE  while  limiting  the 
probability  of  detection  can  be  formulated  as  the  following  constrained  optimiza¬ 
tion 


max  |  Ka||  | 


subject  to  Pr(<5(z) 


l|a)  <  p, 


or  equivalently, 


(5.20) 


min  Pr(<5(z) 


l|a)  subject  to  ||Ka|||  <  C. 


(5.21) 


In  order  to  evaluate  the  true  worst-case  performance  for  any  detector,  (5.20)  or 
(5.21)  would  need  to  be  solved.  This  is  very  difficult,  due  to  the  lack  of  analytical 
expressions  for  the  detection  error  probability  Pr(<5(z)  =  l|a).  We  propose  a  heuris¬ 
tic  for  Pr(<5(z)  =  l|a),  which  will  allow  us  to  approximate  the  above  optimization 
with  one  that  is  easier  to  solve. 
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5.5.2  Minimum  Residue  Energy  Attack 


Given  the  naive  MMSE  state  estimator  x  =  Kz  (5. 4-5. 5),  the  estimation  residue 
error  is  given  by 

r  =  Gz,  G  =  I  —  HK  (5.22) 

Substituting  the  measurement  model,  we  have 

r  =  GHx  +  Ga  +  Ge. 


where  Ga  is  the  only  term  from  the  attack.  Therefore,  an  attack  vector  a  will  be 
more  difficult  to  detect  at  the  control  center  if  Ga  is  small.  Recall  from  (5.6),  the 
damage  in  MSE  done  by  injecting  a  is  || Ka|||.  We  therefore  consider  the  following 
equivalent  problems: 


max||Ka||2  subject  to  l|Ga|| 

2  —  Vj 


(5.23) 


or  equivalently, 


min  || Ga|||  subject  to  ||Ka||  1>C. 

a£Aa 


(5.24) 


The  above  optimizations  remain  difficult  due  to  the  constraint  a  £  As.  However, 
given  a  specific  sparsity  pattern  S  C  {1,  •  •  •  ,  n}  for  which  cq  =  0  for  all  i  ^  S, 
solving  the  optimal  attack  vector  a  for  the  above  two  formulations  is  a  standard 
generalized  eignevalue  problem. 


In  particular,  for  fixed  sparsity  pattern  S,  let  a§  be  the  nonzero  subvector  of 
a.  K§  the  corresponding  submatrix  of  K.  and  G§  similarly  defined.  The  problem 
(5.24)  becomes 


min  || Ggu|| |  subject  to  1 1 Ksu|| |  >  C.  (5.25) 

ueRn-S 

Let  QG  =  GjG§,  Q K  =  KjKs.  It  can  be  shown  that  the  optimal  attack  pattern 
has  the  form 


a 


*  _ 

s  — 


(5.26) 
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where  v  is  the  generalized  eigenvector  corresponding  to  the  smallest  generalized 
eigenvalue  Amin  of  the  following  matrix  pencil 

Qgv  AminQifV  0. 

The  s  dimensional  symmetrical  generalized  eigenvalue  problem  can  be  solved  the 
QZ  algorithm  [97]. 


5.6  Numerical  Simulations 

We  present  some  simulation  results  on  the  IEEE  14  bus  system  shown  in  Fig.  5.1 
to  compare  the  performance  of  the  GLRT  with  the  J(x)  test  and  the  LNR  test 
[69,  70].  For  various  sparsity  levels,  we  find  the  minimum  energy  residue  attack  as 
discussed  in  Sec.  5.5.2.  The  adversary  may  then  scale  this  attack  vector  depending 
on  how  much  it  wishes  to  influence  the  mean  square  error.  We  plot  both  the 
ROC  and  AOC  curves  for  various  sparsity  levels  and  all  three  detectors.  For  the 
AOC  curve,  we  fix  a  probability  of  false  alarm  and  vary  the  length  of  the  attack 
vector  along  the  direction  minimizing  the  energy  residue,  plotting  the  MSE  vs. 
the  probability  of  detection.  For  the  ROC  curve,  we  fix  the  length  of  the  attack 
vector,  but  very  the  detector’s  threshold  and  plot  the  probability  of  false  alarm  vs. 
probability  of  detector.  In  our  simulations,  we  characterize  the  mean  square  error 
increase  at  the  control  center  using  the  ratio  between  the  resulting  MSE  from  the 
attack  and  the  MSE  under  no  attack  (i.e.  a  =  0)  in  dB. 

Fig.  5.2  shows  the  ROC  and  AOC  curves  for  the  worst-case  2-sparse  attack.  We 
implement  the  GLRT  using  exhaustive  search  over  all  possible  sparsity  patterns. 
This  is  feasible  because  of  the  low  sparsity  level,  so  we  need  not  resort  to  the  Li 
minimization  as  in  (5.14).  Observe  that  the  GLRT  performs  consistently  better 
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than  the  other  two  conventional  detectors. 

Fig.  5.3  shows  the  ROC  and  AOC  curves  for  the  worst-case  3-sparse  attack, 
again  using  exhaustive  search  for  the  GLRT.  Interestingly,  the  LNR  test  outper¬ 
forms  the  GLRT  at  this  sparsity  level.  We  believe  the  reason  for  this  is  that  the 
GLRT  has  little  recourse  when  there  is  significant  uncertainty  in  the  sparsity  pat¬ 
tern  of  the  attack.  In  particular,  the  meters  being  controlled  by  the  adversary  here 
are  the  bus  injection  meter  at  bus  1,  and  the  two  meters  on  the  transmission  line 
between  bus  1  and  2.  These  constitute  three  of  the  seven  meters  that  hold  any 
information  about  the  state  at  bus  1.  Thus,  it  may  be  difficult  for  the  detector 
to  determine  which  of  the  several  meters  around  bus  1  are  the  true  adversarial 
meters.  The  GLRT  does  not  react  to  this  uncertainty:  it  can  only  choose  the  most 
likely  sparsity  pattern,  which  is  often  wrong.  Indeed,  in  our  simulations  the  GLRT 
identified  the  correct  sparsity  pattern  only  4.2%  of  the  time. 

Continuing  our  analysis  of  3-sparsity  attacks,  we  conduct  simulations  when  the 
adversaries  are  placed  randomly  in  the  network,  instead  of  at  the  worst-case  meters. 
Once  their  random  meters  are  chosen,  we  find  the  worst-case  attack  vector  using  the 
energy  residual  heuristic.  This  simulates  the  situation  that  the  adversaries  cannot 
choose  their  locations,  but  are  intelligent  and  cooperative  in  their  attack.  The 
resulting  performance  of  the  three  detectors  is  shown  in  Fig.  5.4.  Observe  that  we 
have  recovered  the  outperformance  of  the  GLRT  as  compared  to  the  conventional 
detectors,  if  only  slightly.  When  the  placement  of  the  adversaries  is  random,  they 
are  not  as  capable  of  cooperating  with  one  another,  therefore  their  attack  is  easier 
to  detect. 

We  increase  the  sparsity  level  to  6,  at  which  it  is  impossible  to  perform  exhaus¬ 
tive  search  for  the  GLRT.  At  this  sparsity  level,  it  becomes  possible  to  perform  an 
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Figure  5.2:  Above:  ROC  Performance  of  GLRT  for  the  2  sparsity  case.  MSE  with 
attack  is  8db.  SNR=10db.  Below:  AOC  Performance  of  GLRT  for  the  2  sparsity 
case.  False  alarm  rate  is  0.05.  SNR=10dB. 
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Figure  5.3:  Above:  ROC  Performance  of  GLRT  for  the  3  sparsity  case.  MSE  with 
attack  is  lOdb.  SNR=10db.  Below:  AOC  Performance  of  GLRT  for  3  sparsity 
case.  False  alarm  rate  is  0.05.  SNR=10dB 
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Figure  5.4:  Above:  ROC  Performance  of  GLRT  under  random  attack  for  3  sparsity 
case.  MSE  with  attack  is  6db.  SNR=10db.  Below:  AOC  Performance  of  GLRT 
under  random  attack  for  3  sparsity  case.  False  alarm  rate  is  0.05.  SNR=10dB 
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unobservable  attack,  so  it  is  not  as  illuminating  to  choose  the  worst-case  sparsity 
pattern,  as  that  would  be  very  difficult  to  detect.  Instead,  we  again  choose  the 
sparsity  pattern  randomly  but  optimize  the  attack  within  it.  Fig.  5.5  compares 
the  performance  of  the  GLRT  implemented  via  L i  minimization  as  in  (5.14)  to  the 
two  conventional  detectors.  Note  again  that  the  GLRT  outperforms  the  others. 


Figure  5.5:  ROC  Performance  of  GLRT  under  random  attack  for  6  sparsity  case. 
MSE  with  attack  is  6db.  SNR=10db. 


Finally,  we  present  some  numerical  evidence  that  the  residue  energy  described 
in  Sec.  5.5.2  works  well  as  a  heuristic  in  that  it  is  roughly  increasing  with  the 
probability  of  detection  Pr(<5(z)  =  l|a)  no  matter  what  detector  is  used.  For  the 
J(x)  and  LNR  detectors,  we  consider  the  detection  probability  for  all  1-sparse 
vectors  a  satisfying  || Ka|||  =  C.  on  the  14-bus  test  system.  We  plot  in  Fig.  5.6 
the  value  of  the  residue  energy  vs.  the  true  probability  of  detector  of  a  for  both 
detectors.  Observe  that  the  scatter  plots  are  roughly  increasing. 
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Figure  5.6:  Comparison  of  the  residue  energy  heuristic  with  the  true  detection 
probability  for  1-sparse  attack  vectors  for  both  J(x)  and  LNR  detectors. 

We  evaluate  the  performance  of  the  residue  energy  heuristic  on  2-spa.rse  vectors 
in  the  following  way.  For  each  pair  of  entries  i,j  of  a,  we  optimize  (5.24)  where  a 
is  constraint  to  have  sparsity  pattern  {;<,.)}•  We  then  evaluate  the  true  probability 
of  detection  for  the  two  detectors,  with  the  same  parameter  values  as  above.  The 
results  are  shown  in  Fig.  5.7  for  the  J(x)  and  LNR  detectors.  Again,  the  heuristic 
appears  to  track  the  true  probabilities  reasonably  well.  This  provides  some  jus¬ 
tification  for  our  use  earlier  in  the  ROC  and  AOC  curves  of  approximating  the 
worst-case  performance  of  these  detectors  by  assume  the  maximum  residue  energy 
attack. 
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Figure  5.7:  Comparison  of  the  residue  energy  heuristic  with  the  true  detection 
probability  for  2-sparse  attack  vectors.  Above:  Scatter  plot  for  the  J(x)  detector. 
Below:  Scatter  plot  for  the  LNR  detector. 
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CHAPTER  6 

CONCLUSIONS 

This  thesis  studied  the  problem  of  an  adversary  entering  a  network  and  taking 
control  of  several  nodes  in  it.  We  looked  at  several  specific  problems,  and  found 
strategies  to  defeat  the  adversary  for  each.  We  believe  that  our  most  significant 
contribution,  at  least  for  the  information  theory  problems  studied  in  Chapters  2-4, 
is  the  idea  that  adversaries  can  be  detecting  by  observing  joint  empirical  statis¬ 
tics.  If  the  statistics  do  not  match  what  was  expected,  then  a  traitor  must  be 
present.  This  simple  idea  forms  the  basis  of  Polytope  Codes  against  adversaries 
in  network  coding,  discussed  in  Chapter  2,  as  well  as  the  achievable  strategies 
against  adversaries  in  the  Slepian-Wolf  problem  in  Chapter  3,  and  the  Berger- 
Tung-like  achievable  strategy  against  adversaries  in  various  multiterminal  source 
coding  problems  in  Chapter  4.  We  believe  that  this  basic  idea  can  be  applied 
to  more  general  network  information  theory  problems.  We  now  make  some  more 
specific  comments  on  possible  future  directions  in  each  of  the  areas. 

6.1  Network  Coding 

There  are  numerous  networks  for  which  the  results  of  Chapter  2  do  not  solve  the 
network  coding  problem  under  node-based  adversarial  attack.  The  main  result  in 
Chapter  2  is  Theorem  4,  which  states  that  the  cut-set  upper  bound  is  achievable 
for  a  certain  class  of  planar  graph.  Certainly  it  may  be  possible  to  generalize 
Theorem  4,  and  find  larger  classes  of  networks  for  which  the  cut-set  bound  is 
achievable.  We  believe  that  this  should  be  possible  with  Polytope  Codes.  It  would 
be  interesting  to  analyze  the  planarity  condition  in  more  depth:  perhaps  it  could 
lead  to  a  more  general  theory  of  achievable  rates  given  topological  properties  of 
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the  network. 

However,  as  we  have  shown,  the  cut-set  bound  is  not  always  achievable,  so  to 
solve  the  general  problem  work  would  need  to  be  done  upper  bounds  as  well.  From 
the  complicated  nature  of  the  tighter  upper  bound  given  in  Sec.  2.11,  we  suspect 
that  the  solution  to  the  general  problem  may  be  very  difficult,  and  may  require 
significant  tools  that  have  yet  to  be  developed. 

Perhaps  the  most  interesting  question  regarding  this  problem  is  whether  Poly¬ 
tope  Codes  can  achieve  capacity  for  general  networks,  or  at  least  for  all  one-source 
one-destination  problems  (or  perhaps  even  multicast).  As  far  as  we  know,  they 
are  the  best  known  strategy  for  defeating  adversarial  attacks  on  network  coding, 
as  they  do  at  least  as  well  as  linear  codes,  which  are  used  to  solve  most  problems. 


6.2  Multiterminal  Source  Coding 

The  results  of  Chapter  3  ford  tight  bounds  on  the  set  of  achievable  rates  for  vari¬ 
ous  forms  of  the  Slepian-Wolf  problem.  Therefore  we  do  not  believe  there  is  much 
additional  work  that  could  be  done  in  that  area.  However,  the  more  general  multi¬ 
terminal  source  coding  problems  studied  in  Chapter  4  are  wide  open.  Much  more 
work  could  be  done  on  these  problems  in  the  presence  of  an  adversary.  One  must 
tread  carefully,  however,  because  many  multiterminal  source  coding  problems  are 
open  even  without  adversaries,  so  there  seems  to  be  little  hope  to  find  tight  re¬ 
sults  with  adversaries.  This  was  the  reason  that  we  chose  problems  to  study  in 
Chapter  4  that  had  been  completely  solved  in  the  no-adversary  case,  in  the  hope 
that  they  could  also  be  solved  with  adversaries.  We  provided  bounds  for  these 
problems  in  Chapter  4,  but  did  not  quite  solve  them.  We  conjecture  that  the  in- 
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ner  bounds  resulting  from  our  Berger- Tung-like  achievable  scheme  in  Theorem  9 
are  tight  for  both  the  error  exponent  of  the  discrete  CEO  Problem,  and  the  rate- 
distortion  region  for  the  quadratic  Gaussian  CEO  Problem,  but  we  were  unable  to 
prove  either. 


6.3  Power  System  Sensing  and  Estimation 

Study  of  malicious  data  attacks  on  power  systems  is  still  in  its  infancy.  Chapter  5 
exclusively  studied  the  effect  of  these  attacks  on  state  estimation.  The  data  taken 
by  meters  in  the  power  system  is  used  for  other  things,  and  it  may  be  more  inter¬ 
esting  to  study  the  effect  of  malicious  data  attacks  on  these.  What  is  primarily 
missing  from  Chapter  5  is  a  sense  of  what  the  result  of  these  attacks  are.  For  exam¬ 
ple,  can  they  cause  a  black-out?  The  answer  is  unclear,  because  all  we  know  is  that 
they  may  increase  the  mean  square  error  of  the  state  estimate.  How  this  affects 
the  operation  of  the  power  grid  depends  on  how  the  state  estimate  is  employed 
to  make  decisions  at  the  control  center.  Indeed,  it  is  often  the  case  that  control 
decisions  are  made  directly  from  measurements,  without  being  processed  by  the 
state  estimator;  this  could  induce  further  dangers  if  corrupted  measurements  are 
not  even  corroborated  against  other  measurements. 

Another  application  of  power  measurements  relates  to  the  pricing  of  power 
in  the  network.  If  measurements  strongly  influence  the  compensation  of  genera¬ 
tors,  there  may  be  a  strong  economic  incentive  to  manipulate  them  to  one’s  own 
advantage. 

Finally,  phasor  measurement  units  (PMUs)  are  increasingly  being  installed  at 
busses  in  the  power  grid  [98].  These  allow  much  more  high  quality  measurements 
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of  voltage  levels  than  has  been  previously  available,  including,  for  the  first  time, 
phase  differences  between  busses.  How  this  new  wealth  of  data  may  affect  the 
problem  of  malicious  data  attacks  is  as  yet  unclear. 
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